Academia.eduAcademia.edu
ISSN 1847-3938 organizer 39th international convention May 30 - June 03, 2016, Opatija – Adriatic Coast, Croatia Lampadem tradere mipro - path to knowledge and innovation mipro proceedings My profession. My organization. My IEEE. Discover the benefits of IEEE membership. Join a community of more than 365,000 innovators in over 150 countries. IEEE is the world’s largest technical society, providing members with access to the latest technical information and research, global networking and career opportunities, and exclusive discounts on education and insurance products. Join today www.ieee.org/join MIPRO 2016 39th International Convention May 30 – June 03, 2016 Opatija, Croatia Proceedings Conferences: Microelectronics, Electronics and Electronic Technology /MEET Distributed Computing, Visualization and Biomedical Engineering /DC VIS Telecommunications & Information /CTI Special Session on Future Networks and Services /FNS Computers in Education /CE Computers in Technical Systems /CTS Intelligent Systems /CIS Special Session on Biometrics & Forensics & De-Identification and Privacy Protection /BiForD Information Systems Security /ISS Business Intelligence Systems /miproBIS Digital Economy and Government, Local Government, Public Services / DE-GLGPS MIPRO Junior - Student Papers /SP Edited by: Petar Biljanović International Program Committee Petar Biljanović, General Chair, Croatia S. Amon, Slovenia V. AnĎelić, Croatia M.E. Auer, Austria M. Baranović, Croatia A. Badnjević, Bosnia and Herzegovina B. Bebel, Poland L. Bellatreche, France E. Brenner, Austria A. Budin, Croatia Ţ. Butković, Croatia Ţ. Car, Croatia M. Colnarič, Slovenia A. Cuzzocrea, Italy M. Čičin-Šain, Croatia M. Delimar, Croatia T. Eavis, Canada M. Ferrari, Italy B. Fetaji, Macedonia T. Galinac Grbac, Croatia P. Garza, Italy L. Gavrilovska, Macedonia M. Golfarelli, Italy S. Golubić, Croatia F. Gregoretti, Italy S. Groš, Croatia N. Guid, Slovenia Y. Guo, United Kingdom J. Henno, Estonia L. Hluchy, Slovakia V. Hudek, Croatia Ţ. Hutinski, Croatia M. Ivanda, Croatia H. Jaakkola, Finland L. Jelenković, Croatia D. Jevtić, Croatia R. Jones, Switzerland P. Kacsuk, Hungary A. Karaivanova, Bulgaria M. Mauher, Croatia I. Mekjavić, Slovenia B. Mikac, Croatia V. Milutinović, Serbia V. Mrvoš, Croatia J.F. Novak, Croatia J. Pardillo, Spain N. Pavešić, Slovenia V. Peršić, Croatia T. Pokrajčić, Croatia S. Ribarić, Croatia J. Rozman, Slovenia K. Skala, Croatia I. Sluganović, Croatia V. Sruk, Croatia U. Stanič, Slovenia N. Stojadinović, Serbia J. Sunde, Australia A. Szabo, IEEE Croatia Section L. Szirmay-Kalos, Hungary D. Šarić, Croatia D. Šimunić, Croatia Z. Šimunić, Croatia D. Škvorc, Croatia A. Teixeira, Portugal E. Tijan, Croatia A.M. Tjoa, Austria R. Trobec, Slovenia S. Uran, Croatia T. Vámos, Hungary M. Varga, Croatia M. Vidas-Bubanja, Serbia B. Vrdoljak, Croatia D. Zazula, Slovenia organized by MIPRO Croatian Society technical cosponsorship IEEE Region 8 under the auspices of Ministry of Science, Education and Sports of the Republic of Croatia Ministry of Maritime Affairs, Transport and Infrastructure of the Republic of Croatia Ministry of Entrepreneurship and Crafts of the Republic of Croatia Ministry of Public Administration of the Republic of Croatia Croatian Chamber of Economy Primorsko-goranska County City of Rijeka City of Opatija Croatian Regulatory Authority for Network Industries Croatian Power Exchange - CROPEX patrons University of Rijeka, Croatia University of Zagreb, Croatia IEEE Croatia Section IEEE Croatia Section Computer Chapter IEEE Croatia Section Electron Devices/Solid-State Circuits Joint Chapter IEEE Croatia Section Education Chapter IEEE Croatia Section Communications Chapter T-Croatian Telecom, Zagreb, Croatia Ericsson Nikola Tesla, Zagreb, Croatia Končar - Electrical Industries, Zagreb, Croatia HEP - Croatian Electricity Company, Zagreb, Croatia VIPnet, Zagreb, Croatia University of Zagreb, Faculty of Electrical Engineering and Computing, Croatia Ruđer Bošković Institute, Zagreb, Croatia University of Rijeka, Faculty of Maritime Studies, Croatia University of Rijeka, Faculty of Engineering, Croatia University of Rijeka, Faculty of Economics, Croatia University of Zagreb, Faculty of Organization and Informatics, Varaždin, Croatia University of Rijeka, Faculty of Tourism and Hospitality Management, Opatija, Croatia Polytechnic of Zagreb, Croatia EuroCloud Croatia Croatian Regulatory Authority for Network Industries, Zagreb, Croatia Croatian Post, Zagreb, Croatia Erste&Steiermärkische bank, Rijeka, Croatia Selmet, Zagreb, Croatia CISEx, Zagreb, Croatia Kermas energija, Zagreb, Croatia Rezultanta, Zagreb, Croatia River Publishers, Aalborg, Denmark sponsors Ericsson Nikola Tesla, Zagreb, Croatia T-Croatian Telecom, Zagreb, Croatia Končar-Electrical Industries, Zagreb, Croatia HEP - Croatian Electricity Company, Zagreb, Croatia InfoDom, Zagreb, Croatia Hewlett Packard Croatia, Zagreb, Croatia IN2, Zagreb, Croatia Transmitters and Communications Company, Zagreb, Croatia Storm Computers, Zagreb, Croatia Nokia, Zagreb, Croatia VIPnet, Zagreb, Croatia King-ICT, Zagreb, Croatia Microsoft Croatia, Zagreb, Croatia Micro-Link, Zagreb, Croatia Mjerne tehnologije, Zagreb, Croatia Altpro, Zagreb, Croatia Danieli Automation, Buttrio, Italy Selmet, Zagreb, Croatia ib-proCADD, Ljubljana, Slovenia Nomen, Rijeka, Croatia All papers are published in their original form For Publisher: Petar Biljanović Publisher: Croatian Society for Information and Communication Technology, Electronics and Microelectronics - MIPRO Office: Kružna 8/II, P. O. Box 303, HR-51001 Rijeka, Croatia Phone/Fax: (+385) 51 423 984 Printed by: GRAFIK, Rijeka ISBN 978-953-233-087-8 Copyright  2016 by MIPRO All rights reserved. No part of this book may be reproduced in any form, nor may be stored in a retrieval system or transmitted in any form, without written permission from the publisher. CONTENTS LIST OF PAPER REVIEWERS LIST OF AUTORS FOREWORD MICROELECTRONICS, ELECTRONICS AND ELECTRONIC TECHNOLOGY INVITED PAPER (Si)GeSn Nanostructures for Optoelectronic Device Applications ..................................... 5 I.A. Fischer, F. Oliveira, A. Benedetti, S. Chiussi, J. Schulze PAPERS Thermoelectric Properties of Polycrystalline WS2 and Solid Solutions of WS2-ySey Types .................................................................................................................. 11 G.E. Yakovleva, A.I. Romanenko, A.S. Berdinsky, A.Yu. Ledneva, V.A. Kuznetsov, M.K. Han, S.J. Kim, V.E. Fedorov Piezoresistive Effect in Polycrystalline Bulk and Film Layered Sulphide W0.95Re0.05S2 ....................................................................................................................................................................................... 16 V.A. Kuznetsov, A.I. Romanenko, A.S. Berdinsky, A.Yu. Ledneva, S.B. Artemkina, V.E. Fedorov Luminescent Diagnostics in the NIR-region on a Base of Yb-porphyrin Complexes .............................................................................................................................. 20 V.D. Rumyantseva, I.P. Shilov, Yu.V. Alekseev, A.S. Gorshkova Simulation Study of the Composite Silicon Solar Cell Efficiency Sensitivity to the Absorption Coefficients and the Thickness of intrinsic Absorber Layer .............. 24 V. Tudić, N. Posavec The Investigation of Influence of Localized States on a-Si:H p-i-n Photodiode Transient Response to Blue Light Impulse with Blue Light Optical Bias ....................... 30 M. Ĉović, V. Gradišnik, Ţ. Jeriĉević Analysis of Electrical and Optical Characteristics of InP/InGaAs Avalanche Photodiodes in Linear Regime by a New Simulation Environment ................................. 34 T. Kneţević, T. Suligoj V Design of Passive-Quenching Active-Reset Circuit with Adjustable Hold-Off Time for Single-Photon Avalanche Diodes ......................................................................... 40 I. Berdalović, Ţ. Osreĉki, F. Šegmanović, D. Grubišić, T. Kneţević, T. Suligoj Impact of the Emitter Polysilicon Thickness on the Performance of High-Linearity Mixers with Horizontal Current Bipolar Transistors ............................. 46 J. Ţilak, M. Koriĉić, H. Mochizuki, S. Morita, T. Suligoj Fully-integrated Voltage Controlled Oscillator in Low-cost HCBT Technology ............................................................................................................................. 51 M. Koriĉić, J. Ţilak, H. Mochizuki, S. Morita, T. Suligoj Variable-Gain Amplifier for Ultra-Low Voltage Applications in 130nm CMOS Technology ............................................................................................................................. 57 D. Arbet, M. Kováĉ, L. Nagy, V. Stopjaková, M. Šovĉík Relaxation Oscillator Calibration Technique with Comparator Delay Regulation .............................................................................................................................. 63 J. Mikulić, G. Schatzberger, A. Barić A Bootstrap Circuit for DC–DC Converters with a Wide Input Voltage Range in HV-CMOS ............................................................................................................. 68 N. Mitrović, R. Enne, H. Zimmermann A Fractional-N Subsampling PLL based on a Digital-to-Time Converter ...................... 72 N. Markulic, K. Raczkowski, P. Wambacq, J. Craninckx Infrared Protection System for High-Voltage Testing of SiC and GaN FETs used in DC-DC Converters ................................................................................................... 78 F. Hormot, J. Baĉmaga, A. Barić Optimal Conduction Angle of an E-PHEMT Harmonic Frequency Multiplier ............................................................................................................................... 82 K. Martinĉić Ultra-Wideband Transmitter Based on Integral Pulse Frequency Modulator T. Matić, M. Herceg, J. Job, L. Šneler .................................................................................... 86 Design of a Transmitter for High-Speed Serial Interfaces in Automotive MicroController ............................................................................................................................... 90 A. Bandiziol, W. Grollitsch, F. Brandonisio, R. Nonis, P. Palestri Application of the Calculation-Experimental Method in the Design of Microwave Filters .................................................................................................................. 95 A.S. Geraskin, A.N. Savin, I.A. Nakrap, V.P. Meshchanov Minimax Design of Multiplierless Sharpened CIC Filters Based on Interval Analysis ................................................................................................................................ 100 G. Molnar, A. Dudarin, M. Vuĉić VI Minimization of Maximum Electric Field in High-Voltage Parallel-Plate Capacitor .............................................................................................................................. 105 R. Bleĉić, Q. Diduck, A. Barić Modelling SMD Capacitors by Measurements ................................................................. 110 R. Mišlov, M. Magerl, S. Fratte-Sumper, B. Weiss, C. Stockreiter, A. Barić Impact of Capacitor Dielectric Type on the Performance of Wireless Power Transfer System ................................................................................................................... 116 D. Vinko, P. Oršolić Switching Speed and Stress Analysis for Fixed-fixed Beam Based Shunt Capacitive RF MEMS Switches ......................................................................................... 120 A. Kumar A., R. R Performance Analysis of Micromirrors - Lift-off and von Mises Stress ........................ 126 S. Finny, R. R Material and Orientation Optimization for Quality Factor Enhancement of BAW Resonators ............................................................................................................. 130 R. Raj R.S., R. R Impact of Propagation Medium on Link Quality for Underwater and Underground Sensors ......................................................................................................... 135 G. Horvat, D. Vinko, J. Vlaović Electrical Field Intensity Model on the Surface of Human Body for Localization of Wireless Endoscopy Pill ........................................................................... 141 B. Lukovac, A. Koren, A. Marinĉić, D. Šimunić Wide Band Current Transducers in Power Measurment Methods - an Overview .............................................................................................................................. 146 R. Malarić, Ţ. Martinović, M. Dadić, P. Mostarac, Ţ. Martinović Laboratory Model for Design and Verification of Synchronous Generator Excitation Control Algorithms ........................................................................................... 152 S. Tusun, I. Erceg, I. Sirotić The European Project SolarDesign Illustrating the Role of Standardization in the Innovation System .................................................................................................... 158 W. Brenner, N. Adamovic Open Public Design Methodology and Design Process .................................................... 164 D. Rembold, S. Jovalekic VII DISTRIBUTED COMPUTING, VISUALIZATION AND BIOMEDICAL ENGINEERING INVITED PAPER Views on the Role and Importance of Dew Computing in the Service and Control Technology ............................................................................................................. 175 Z. Šojat, K. Skala PAPERS DISTRIBUTED COMPUTING AND CLOUD COMPUTING Parameters That Affect the Parallel Execution Speed of Programs in Multi-Core Processor Computers ...................................................................................... 185 V. Xhafa, F. Dika Federated Computing on the Web: the UNICORE Portal .............................................. 190 M. Petrova-El Sayed, K. Benedyczak, A. Rutkowski, B. Schuller Problem-Oriented Scheduling of Cloud Applications: PO-HEFT Algorithm Case Study ............................................................................................................................ 196 E.A. Nepovinnykh, G.I. Radchenko Towards a Novel Infrastructure for Conducting High Productive Cloud-Based Scientific Analytics .............................................................................................................. 202 P. Brezany, T. Ludescher, T. Feilhauer An OpenMP Runtime Profiler/Configuration Tool for Dynamic Optimization of the Number of Threads .................................................................................................. 208 T. Dancheva, M. Gusev, V. Zdravevski, S. Ristov An Effective Task Scheduling Strategy in Multiple Data Centers in Cloud Scientific Workflow ............................................................................................................. 214 E.I. Djebbar, G. Belalem Visualisation in the ECG QRS Detection Algorithms ...................................................... 218 A. Ristovski, A. Guseva, M. Gusev, S. Ristov Analysis and Comparison of Algorithms in Advanced Web Clusters Solutions ............................................................................................................................... 224 D. Alagić, K. Arbanas Metamodeling as an Approach for Better Computer Resources Allocation in Web Clusters ........................................................................................................................ 230 D. Alagić, D. Maĉek VIII Showers Prediction by WRF Model above Complex Terrain ......................................... 236 T. Davitashvili, N. Kutaladze, R. Kvatadze, G. Mikuchadze, Z. Modebadze, I. Samkharadze Methods and Tools to Increase Fault Tolerance of High-Performance Computing Systems ............................................................................................................. 242 I.A. Sidorov Logical-Probabilistic Analysis of Distributed Computing Reliability ............................ 247 A.G. Feoktistov, I.A. Sidorov Distributed Graph Reduction Algorithm with Parallel Rigidity Maintenance ............. 253 D. Sušanj, D. Arbula Architecture of Virtualized Computational Resource Allocation on SDN-enhanced Job Management System Framework .................................................... 257 Y. Watashiba, S. Date, H. Abe, K. Ichikawa, Y. Kido, H. Yamanaka, E. Kawai, S. Shimojo Near Real-time Detection of Crisis Situations .................................................................. 263 S. Girtelschmid, A. Salfinger, B. Pröll, W. Retschitzegger, W. Schwinger Automatic Protocol Based Intervention Plan Analysis in Healthcare ............................ 269 M. Kozlovszky, L. Kovács, K. Batbayar, Z. Garaguly Using Fourier and Hartley Transform for Fast, Approximate Solution of Dense Linear Systems ..................................................................................................................... 274 Ţ. Jeriĉević, I. Koţar Procedural Generation of Mediterranean Environments ............................................... 277 N. Mikuliĉić, Ţ. Mihajlović Energy-Aware Power Management of Virtualized Multi-core Servers through DVFS and CPU Consolidation ............................................................................ 283 H. Rostamzadeh Hajilari, M.M. Talebi, M. Sharifi Human Posture Detection Based on Human Body Communication with Muti-carriers Modulation ................................................................................................... 289 W. Ni, Y. Gao, Ţ. Luĉev Vasić, S.H. Pun, M. Cifrek, M.I. Vai, M. Du SAT-Based Search for Systems of Diagonal Latin Squares in Volunteer Computing Project SAT@home ........................................................................................ 293 O. Zaikin, S. Kochemazov, A. Semenov Architectural Models for Deploying and Running Virtual Laboratories in the Cloud .................................................................................................................................... 298 E. Afgan, A. Lonie, J. Taylor, K. Skala, N. Goonasekera A CAD Service for Fusion Physics Codes ......................................................................... 303 M. Telenta, L. Kos IX Correlation between Attenuation of 20 GHz Satellite Communication Link and Liquid Water Content in the Atmosphere ........................................................................ 308 M. Kolman, G. Kosec Practical Implementation of Private Cloud with Traffic Optimization ......................... 314 D.G. Grozev, M.P. Shopov, N.R. Kakanakov Improving Data Locality for NUMA-Agnostic Numerical Libraries ............................. 320 P. Zinterhof Use Case Diagram Based Scenarios Design for a Biomedical Time-Series Analysis Web Platform ....................................................................................................... 326 A. Jović, D. Kukolja, K. Jozić, M. Cifrek Augmented Reality for Substation Automation by Utilizing IEC 61850 Communication ................................................................................................................... 332 M. Antonijević, S. Suĉić, H. Keserica Innovation of the Campbell Vision Stimulator with the Use of Tablets ........................ 337 J. Brozek, M. Jakes, V. Svoboda Classification of Scientific Workflows Based on Reproducibility Analysis ................... 343 A. Bánáti, P. Kacsuk, M. Kozlovszky Dynamic Execution of Scientific Workflows in Cloud ..................................................... 348 E. Kail, J. Kovács, M. Kozlovszky, P. Kacsuk FPGA Kernels for Classification Rule Induction ............................................................. 353 P. Škoda, B. Medved Rogina VISUALIZATION SYSTEMS Prototyping of Visualization Designs of 3D Vector Fields Using POVRay Rendering Engine ................................................................................................................ 361 J. Opiła New Cybercrime Taxonomy of Visualization of Data Mining Process .......................... 367 M. Babiĉ, B. Jerman-Blaţiĉ Visual Representation of Predictions in Software Development Based on Software Metrics History Data .......................................................................................... 370 B. Popović, A. Balota, Dţ. Strujić Interaction with Virtual Objects in a Natural Way ......................................................... 376 I. Prazina, K. Balić, K. Pršeš, S. Rizvić, V. Okanović Bone Shape Characterization Using the Fourier Transform and Edge Detection in Digital X-Ray Images ..................................................................................... 380 D. Sušanj, G. Gulan, I. Koţar, Ţ. Jeriĉević X GIS in the e-Government Platform to Enable State Financial Subsidies Data Transparency ....................................................................................................................... 383 M. Kranjac, U. Sikimić, I. Simić, M. Paroški, S. Tomić Evaluation of Caching Techniques for Video on Demand in Named Data Networks .............................................................................................................................. 388 K. Jakimoski, S. Arsenovski, L. Gorachinova, S. Chungurski, O. Iliev, L. Djinevski, E. Kamcheva BIOMEDICAL ENGINEERING Diagnostic of Asthma Using Fuzzy Rules Implemented in Accordance with International Guidelines and Physicians Experience ...................................................... 395 A. Badnjević, L. Gurbeta, M. Cifrek, D. Marjanović Robust Beat Detection on Noisy Differential ECG .......................................................... 401 P. Lavriĉ, M. Depolli Classification of Asthma Using Artificial Neural Network ............................................. 407 A. Badnjević, L. Gurbeta, M. Cifrek, D. Marjanović Brain-Computer Interface Based on Steady-State Visual Evoked Potentials ............... 411 K. Friganović, M. Medved, M. Cifrek Comparison of Wireless Electrocardiographic Monitoring and Standard ECG in Dogs ........................................................................................................................ 416 A. Krvavica, Š. Likar, M. Brloţnik, A. Domanjko-Petriĉ, V. Avbelj A Medical Cloud .................................................................................................................. 420 J. Tasiĉ, M. Gusev, S. Ristov A Hospital Cloud-Based Open Archival Information System for the Efficient Management of HL7 Big Data ........................................................................................... 426 A. Celesti, M. Fazio, A. Romano, M. Villari Recognition and Adjustment for Strip Background Baseline in Fluorescence Immuno-chromatographic Detection System ................................................................... 432 Y. Gao, C. Lin, S.H. Pun, M.I. Vai, M. Du Agile Development of a Hospital Information System ..................................................... 436 S.L.R. Vrhovec SOA Based Interoperability Component for Healthcare Information System ............. 442 D. Kuĉak, G. Đambić, V. Kokanović Wireless Intrabody Communication Sensor Node Realized Using PSoC Microcontroller .................................................................................................................... 446 F. Grilec, Ţ. Luĉev Vasić, W. Ni, Y. Gao, M. Du, M. Cifrek XI Detection of Heart Rate Variability from a Wearable Differential ECG Device .................................................................................................................................... 450 J. Slak, G. Kosec Penetration of the ICT Technology to the Health Care Primary Sector – Ljubljana PILOT ................................................................................................................ 456 T. Poplas Susiĉ, U. Staniĉ Image-Based Metal Artifact Reduction in CT Images ..................................................... 462 A. Šerifović-Trbalić, A. Trbalić New Algorithm for Automatic Determination of Systolic and Diastolic Blood Pressures in Oscillometric Measurements ............................................................. 467 V. Jazbinšek TGTP-DB – a Database for Extracting Genome, Transcriptome and Proteome Data Using Taxonomy ....................................................................................... 472 K. Kriţanović, M. Marinović, A. Bulović, R. Vaser, M. Šikić Development and Perspectives of Biomedical Engineering in South East European Countries 477 A. Badnjević, L. Gurbeta Clustering of Heartbeats from ECG Recordings Obtained with Wireless Body Sensors ........................................................................................................................ 481 A. Rashkovska, D. Kocev, R. Trobec Heart Rate Analysis with NevroEkg .................................................................................. 487 M. Mohorĉiĉ, M. Depolli TELECOMMUNICATIONS & INFORMATION FNS • SPECIAL SESSION ON FUTURE NETWORKS AND SERVICES PAPERS A Survey of IoT Cloud Providers ...................................................................................... 497 T. Pflanzner, A. Kertesz QoS-Aware Deployment of Data Streaming Applications over Distributed Infrastructures ..................................................................................................................... 503 M. Nardelli QoS-Aware Application Placement Over Distributed Cloud .......................................... 509 F. Bianchi, F. Lo Presti XII Energy-Aware Control of Server Farms ........................................................................... 515 M.E. Gebrehiwot, S. Aalto, P. Lassila SDN Based Service Provisioning Management in Smart Buildings ............................... 521 M. Tošić, O. Iković, D. Bošković TELECOMMUNICATIONS & INFORMATION INVITED PAPER Time Series Analysis and Possible Applications ............................................................... 531 M. Ivanović, V. Kurbalija PAPERS WIRELESS COMMUNICATIONS AND TECHNOLOGIES Wireless Resonant Power Transfer – An Overview ......................................................... 543 Ţ. Martinović, M. Dadić, R. Malarić, Ţ. Martinović Investigation of a Small Handheld PCB Nesting Two Antennas NFC 13.56 MHz and to RF 868 MHz .......................................................................................... 550 L.A. Iliev, I.S. Stoyanov, T.B. Iliev, E.P. Ivanova, Gr.Y. Mihaylov The Coverage Belt for Low Earth Orbiting Satellites ..................................................... 554 S. Cakaj The Investigation of the Effect of the Carrier Frequency Offset (CFO) in SC-FDMA System ............................................................................................................... 558 N. Taşpinar, M. Balki Performance Analysis of Low Density Parity Check Codes Implemented in Second Generations of Digital Video Broadcasting Standards .................................. 562 Gr.Y. Mihaylov, T.B. Iliev, E.P. Ivanova, I.S. Stoyanov, L.A. Iliev DATA AND IMAGE ANALYSIS Iterative Denoising of Sparse Images ................................................................................ 569 I. Stanković, I. Orović, S. Stanković, M. Daković Compressive Sensing Based Image Processing in TrapView Pest Monitoring System .............................................................................................................. 574 M. Marić, I. Orović, S. Stanković Big Data Analytics for Communication Service Providers ............................................. 579 D. Šipuš Role of Data Analytics in Utilities Transformation .......................................................... 584 V. Ĉaĉković, Ţ. Popović XIII Using MEAN Stack for Development of GUI in Real-Time Big Data Architecture ......................................................................................................................... 590 M. Štajcer, M. Štajcer, D. Orešĉanin ................................................................................................................................. NETWORK TECHNOLOGIES A Survey on Physical Layer Impairments Aware Routing and Wavelength Assignment Algorithms in Transparent Wavelength Routed Optical Networks ................................................................................................... 599 H. Dizdarević, S. Dizdarević, M. Škrbić, N. Hadţiahmetović A Survey on Transition from GMPLS Control Plane for Optical Multilayer Networks to SDN Control Plane ..................................................................... 606 S. Dizdarević, H. Dizdarević, M. Škrbić, N. Hadţiahmetović About the Telco Cloud Management Architectures ........................................................ 614 I. Nenadić, D. Kobal, D. Palata CPE Virtualization by Unifying NFV, SDN and Cloud Technologies ........................... 622 P. Cota, J. Šabec Soft Sensors in Wireless Networking as Enablers for SDN Based Management of Content Delivery ...................................................................................... 628 M. Tošić, O. Iković, D. Bošković A FIRM Approach for Software-Defined Service Composition ..................................... 634 P. Kathiravelu, T. Galinac Grbac, L. Veiga Test Environment & Application as a Service .................................................................. 640 T. Ţitnik, M. Galin, G. Pauković, I. Dević, R. Ĉiţmar, Z. Bosić Workaround Solutions Used During PSTN Migration of Customers to IMS Network ....................................................................................................................... 644 N. Štokić Development of the Generic OFDM Based Transceiver in the LabView Software Environment ........................................................................................................ 650 D. Hamidović, N. Suljanović The Challenge of Cellular Cooperative ITS Services Based on 5G Communications Technology ............................................................................................. 656 Z. Kljaić, P. Škorput, N. Amin NETWORKS PERFORMANCES Performance Evaluation of Different Scheduling Algorithms in LTE Systems ................................................................................................................................. 667 A. Marinĉić, D. Šimunić XIV Performance Analysis of LTE Networks with Random Linear Network Coding .................................................................................................................................. 673 T.D. Assefa, K. Kralevska, Y. Jiang VoLTE E2E Performance Management ........................................................................... 679 D. Klobuĉarević, Ţ. Klobuĉarević, D. Belošić Balancing Security and Blocking Performance with Reconfiguration of the Elastic Optical Spectrum ......................................................................................... 684 S. Kumar Singh, W. Bziuk, A. Jukan Ensuring Continuous Operation of Critical Process of Remote Control System at the Level of Network Connectivity ................................................................... 690 I. Fosić, D. Budiša IoT PLATFORM AND APPLICATIONS Requirements and Challenges in Wireless Network‟s Performance Evaluation in Ambient Assisted Living Environments .................................................... 699 A. Koren, D. Šimunić Long Term Evolution as a Precondition for Internet of Postal Things .......................... 703 A. Kosovac, A. Veispahić, M. Berković Advanced Sensing and Internet of Things in Smart Cities ............................................. 707 D. Capeska Bogatinoska, R. Malekian, J. Trengoska, W. Asiama Nyako Security Challenges of the Internet of Things .................................................................. 713 M. Weber, M. Boban Promoting Health for Chronic Conditions: a Novel Approach That Integrates Clinical and Personal Decision Support ......................................................... 719 I. Lasorsa, M. Ajĉević, P. D’Antrassi, G. Carlini, A. Accardo, S. Marceglia A Taxonomy of Localization Techniques Based on Multidimensional Scaling .................................................................................................................................. 724 B. Risteska Stojkoska Distributed Real-Time Lift Kinematic Monitoring Using COTS Smartphones ........................................................................................................................ 730 N. Miškić-Pletenac, K. Lenac Mobile Devices as Authentic and Trustworty Sources in Multi-Agent Systems ................................................................................................................................. 736 V. Vyroubal, A. Stanĉić, I. Grgurević SOFTWARE ENGINEERING Comparative Analysis of Functional and Object-Oriented Programming .................... 745 D. Alić, S. Omanović, V. Giedrimas XV Improving the Composition and Assembly of APIs in Service Dominant Ecosystem Environments .................................................................................................... 751 D. Ramljak Service Level Agreement - SLA, raspoloživost servisa i kvalitet sistema u telekomunikacijama ............................................................................................................ 755 D. Glamoĉanin Challenges of a Service Transition in Multi Domain Environment ............................... 761 I. Golub, B. Radojević Teaching “Ten Commandments” of Software Engineering ............................................ 766 Z. Putnik, M. Ivanović, Z. Budimac, K. Bothe Methodologies for Development of Mobile Applications ................................................. 772 Z. Stapić, M. Mijaĉ, V. Strahonja Upravljanje promjenama na primjeru telekom operatera u Jugoistočnoj Evropi ................................................................................................................................... 777 A. Gabela Tehnologije integracije informacijskih sustava ................................................................ 783 A. Stojanović, N. Lazić, Ţ. Kovaĉević TELECOM PRODUCTS, SERVICES AND MARKET Restructuring of Telco Products ........................................................................................ 791 I. Vrbovĉan, T. Pavić, M. Šoša Anić Moving from Network-Centric toward Customer-Centric CSPs in Bosnia and Herzegovina .................................................................................................................. 794 N. Banović-Ćurguz, D. Ilišević Future Communication Model: Challenges and Opportunities for Society as a Whole ............................................................................................................................ 800 D. Ilišević, N. Banović-Ćurguz Some Aspects of Network Management System for Video Service ................................ 805 O. Jukić, I. HeĊi Influence of OTT Service Providers on Croatian Telecomunication Market .................................................................................................................................. 809 I. Draţić Lutilsky, M. Ivić Implementing Shared Service Center in Telecom Environment as More Efficient and More Cost Effective Business Model .......................................................... 814 T. Ţilić, V. Ĉošić XVI ICT APPLICATIONS Usluga mobilnog plaćanja računa m:Pay ......................................................................... 821 V. Ţlof, S. Salapura Praćenje imovinsko pravnih poslova na elektroničkoj komunikacijskoj infrastrukturi putem Web GIS aplikacije ........................................................................ 826 D. Salopek, T. Đigaš, F. Ambroš, M. Štimac Fieldbus Diagnostic Online Solution Program Establishment at Rijeka Oil Refinery ................................................................................................................................ 831 B. Ţeţelj, H. Hajdo Automatic Communication System Ship to Shipping Terminal, for Reporting Potential Malfunctions of a Ballast Water Treatment System Operation ............................................................................................................................. 836 G. Bakalar, M. Baggini AdriaHUB ICT platform .................................................................................................... 841 T. Škorjanc, R. Ţigulić, N. AnĊelić Mjerenje kvaliteta usluge mobilnog plaćanja m:Pay ....................................................... 847 S. Salapura, V. Ţlof Uspostava sustava upravljanja identitetima u Carinskoj upravi ................................... 852 M. Hajnić, D. Cmuk COMPUTERS IN EDUCATION INVITED PAPER New Informatics Curriculum - Croatian Tradition with World Trends ....................... 863 L. Kralj PAPERS Creativity, Communication and Collaboration: Grading with Open Badges ................................................................................................................................... 869 I. Salopek Ĉubrić, G. Ĉubrić A Study of Students‟ Attitudes and Perceptions of Digital Scientific Information Landscape ....................................................................................................... 875 R. Vrana Researcher Measured - Towards a Measurement-driven Academia ............................. 881 H. Jaakkola, J. Henno, J. Mäkelä, K. Ahonen XVII Use of „Learning Analytics‟ ................................................................................................. 888 J. Henno, H. Jaakkola, J. Mäkelä Smart Immersive Education for Smart Cities with Support via Intelligent Pedagogical Agents .............................................................................................................. 894 M. Soliman, A. Elsaadany Review of Source-Code Plagiarism Detection in Academia ............................................ 901 M. Novak The Comparison of Impact Offline and Online Presentation on Student Achievements: A Case Study .............................................................................................. 907 P. Esztelecki, G. Kőrösi, Z. Námestovski, L. Major Digital Competences for Teachers: Classroom Practice .................................................. 912 M. Filipović Tretinjak, V. AnĊelić Introducing Collaborative e-Learning Activities to the e-Course “Information Systems” ....................................................................................................... 917 M. Ašenbrener Katić, S. Ĉandrlić, M. Holenko Dlab A Curriculum for Unified Embedded Engineering Education ....................................... 923 I. Kaštelan, M. Temerinac Individual versus Collaborative Learning in a Virtual World ....................................... 929 P. Pürcher, M. Höfler, J. Pirker, L. Tomes, A. Ischebeck, C. Gütl Preparation of a Hybrid e-Learning Course for Gamification ....................................... 934 D. Kermek, D. Strmeĉki, M. Novak, M. Kaniški Implementation of Fundamental Ideas into the Future Managers´ Informatics Education ........................................................................................................ 940 L. Révészová Fostering Creativity in Technology Enhanced Learning ................................................ 946 A. Ţiţić, A. Granić, I. Šitin Teaching Physics in Primary Schools with Tablet Computers: Key Advantages ........................................................................................................................... 952 V. Grubelnik, L. Grubelnik Project Based Learning (PBL) in the Teachers‟ Education ............................................ 957 M. Krašna Didactical Suitability of e-Generated Drill Tests for Physics .......................................... 962 R. Repnik, M. Soviĉ Utilizing MOOCs in the Development of Education and Training Programs .............................................................................................................................. 966 P. Linna, T. Mäkinen, H. Keto XVIII Distance Delivery and Technology-Enhanced Learning in Information Technology and Programming Courses at RIT Croatia ................................................. 970 K. Marasović, B. Mihaljević, I. Baĉić Overview of IT Solutions for Career Services and Quality Assurance at Higher Education ............................................................................................................ 976 E. Gjorgjevska, P. Tonkovikj, M. Gusev Selecting the Most Appropriate Web IDE for Learning Programming Using AHP ............................................................................................................................ 982 I. Škorić, B. Pein, T. Orehovaĉki Using Real Projects as Motivators in Programming Education ..................................... 988 M. Konecki, S. Lovrenĉić, M. Kaniški Making Programming Education More Accessible for Visually Impaired ............................................................................................................................... 992 M. Konecki, N. Ivković, M. Kaniški Use of Computer Programs in Teaching Photography Courses at Schools of Applied Arts and Design in Croatia ............................................................ 996 Z. Prohaska, Z. Prohaska, I. Uroda Universtiy Search Engine .................................................................................................. 1002 Ţ. Knok, M. Marĉec Experience with Usage of LMS Moodle not Only for the Educational Purposes at the Educational Institution .......................................................................... 1006 D. Paľová Using Robot Simulation Applications at the University – Experiences with the KUKA Sim ................................................................................................................... 1012 D. Lukac Implementation and Analysis of Open Source Information Systems in Electronic Business Course for Economy Students ....................................................... 1017 H. Jerković, P. Vranešić, G. Slamić Virtual Firms as Education Tool in the Field of eCommerce ....................................... 1023 M. Vejaĉka Systems and Software Assurance - A Model Cyber Security Course .......................... 1028 V. Jovanović, J. Harris Analysis of Learning Management Systems Features and Future Development Challenges in Modern Cloud Environment ............................................. 1033 H. Jerković, P. Vranešić, A. Radan Markov Model of Mathematical Competences in Elementary Education ................... 1039 G. Paić, B. Tepeš, K. Pavlina XIX PYTHON as Pseudo Language for Formal Language Theory ..................................... 1045 Z. Dovedan Han, K. Kocijan, V. Lopina Croatian Students' Attitudes Towards Technology Usage in Teaching Asian Languages – a Field Research ............................................................................... 1051 M. Janjić, S. Librenjak, K. Kocijan Adaptive e-Learning System for Language Learning: Architecture Overview ............................................................................................................................ 1056 V. Slavuj, B. Kovaĉić, I. Jugo L2L – Learn to Learn: Teach to Learn: CARTOON ENGLISH (A constructivist approach to teaching and learning) .................................................... 1061 K. Bedi Facilitating Mobile Learning by Use of Open Access Information Resources ............................................................................................................................ 1067 R. Vrana Work-Based Learning: New Skills for New Technologies ............................................ 1072 M. Lamza Maronić, I. Ivanĉić Creating Assets as a Part of Tertiary Education of Technical Domains ...................... 1078 J. Brozek, D. Hamernik, Z. Kopecky Software Solution Incorporating the Steganographic Principle for Hiding Pictures within Pictures .................................................................................................... 1084 J. Brozek, J. Marek, V. Svoboda A Platform Independent Tool for Programming, Visualization and Simulation of Simplified FPGAs ...................................................................................... 1091 M. Ĉupić, K. Brkić, Ţ. Mihajlović Digital Risks and Experiences of Future Teachers ........................................................ 1097 T. Bratina A Study of Factors Influencing Higher Education Teachers' Intention to Use e-Learning in Hybrid Environments ........................................................................ 1103 S. Babić, M. Ĉiĉin-Šain, G. Bubaš Development and Implementation of E-Learning System in Smart Educational Environment ................................................................................................. 1109 A. Elsaadany, K. Abbas Introducing Inquiry-Based Learning to Estonian Teachers: Experiences from the Creative Classroom Project .............................................................................. 1115 N. Hoić-Boţić, M. Laanpere, K. Pata, I. Franković, S. Teder Mobile Robots Approach for Teaching Programming Skills in Schools ..................... 1121 W. Werth, C. Ungermanns XX Age Independent Examination of Algorithm Creating Abilities .................................. 1125 Z.A. Godó, D. Kocsis, G. Kiss, G. Stóka The Digitalization Push in Universities ........................................................................... 1130 H. Jaakkola, H. Aramo-Immonen, J. Henno, J. Mäkelä Toby the Explorer – an Interactive Educational Game for Primary School Pupils .................................................................................................................................. 1137 N. Kaevikj, A. Kostadinovska, B. Risteska Stojkoska, M. Mihova, K. Trivodaliev The Use of Contemporary e-Services and e-Contents at Mother Tongue Classes ................................................................................................................................ 1142 V. Jesenek Migration from in-House LMS to Google Classroom: Case of SEEU ......................... 1145 L. Abazi Bexheti, A. Kadriu, M. Apostolova Trpkovska Survey Analyses of Impacting Factors in ICT Usage in School Management: Case Study ................................................................................................. 1149 B. Fetaji, M. Fetaji, R. Azemi, M. Ebibi Case Study Analyses of Semantic Security Using SQL Injection in Web Enabled ORACLE Database ............................................................................................ 1155 M. Fetaji, B. Fetaji, M. Ebibi Using Web Applications in Education ............................................................................. 1161 A. Babić, S. Vukmirović, Z. Ĉapko Qualitative Approach to Determining the Relevant Facets of Mobile Quality of Educational Social Web Applications ........................................................... 1165 T. Orehovaĉki, S. Babić MeĎukurikularni projekti u nastavi informatike u Ekonomskoj školi – primjeri dobre prakse ....................................................................................................... 1171 S. Bulešić Milić Tradicionalni ili hibridni model nastave računalstva .................................................... 1174 M. Sertić, K. Šolić Provjere znanja pomoću Classroom Managera u učionicama budućnosti .......................................................................................................................... 1180 M. Korać Primjena e-učenja u hrvatskom vojnom obrazovanju ................................................... 1184 D. Moţnik Prezentacijski alati za prikaz matematičkih sadržaja ................................................... 1190 M. Štefan Trubić, I. Radošević XXI Siguran put do škole .......................................................................................................... 1196 D. Šokac, I. Biuklija Primjena obrazovne društvene mreže Edmodo u nastavi III. osnovne škole Čakovec ..................................................................................................................... 1199 N. Boj Digitalni scenariji učenja .................................................................................................. 1203 M. Mirković Detekcija najčešćih sintaktičkih i logičkih grešaka učenika kod stvaranja programa u početnim godinama učenja programiranja .............................. 1209 K. Blaţeka Nastava matematike na SageMathCloud platformi ....................................................... 1215 Ţ. Tutek Uvod u robotiku - Arduino platforma i web aplikacija ................................................. 1218 A. Lacković, B. Fulanović Informacijski sustav visokih učilišta - analiza slučaja za Veleučilište u Šibeniku .............................................................................................................................. 1222 S. Krajaĉić, L. Topolĉić, F. Urem Mobilne aplikacije u visokom obrazovanju .................................................................... 1225 M. Blašković, M. Fumić, F. Urem Metodologija izrade E – learning sadržaja za edukaciju o izradi Standarda zanimanja ........................................................................................................ 1230 I. Vunarić, S. Grgić, T. Babić Uloga IKT u razvoju financijske pismenosti djece ........................................................ 1235 I. Ruţić Informacijsko-komunikacijske znanosti u nastavi - digitalizirani materijali za učenje ........................................................................................................... 1239 T. Babić, A. Ogrin, M. Babić Istraživanje stavova i očekivanja studenata prilikom upisa na studij kao metoda povećanja kvalitete usluge u visokom obrazovanju .................................. 1245 T. Babić, S. Grgić, E. Rajković E-obrazovanjem do fleksibilnog modela učenja ............................................................. 1250 M. Boţurić, R. Bogut, M. Tretinjak Preporuke i primjeri dobre prakse e-učenja u hrvatskom visokom školstvu ............. 1254 D. Junaković, I. Paćelat, F. Urem XXII Ilustracija primjene novog Kurikuluma iz predmeta Informatika i to domene - Računalno razmišljanje i programiranje na primjeru metode Početnica Mema za prvi razred osnovne škole ............................................................... 1258 M. Ĉiĉin-Šain, S. Babić, L. Kralj Izloženost i navike korištenja medija i računala kod djece u razrednoj nastavi ................................................................................................................................. 1262 T. Paviĉić, J. Šurić COMPUTERS IN TECHNICAL SYSTEMS INVITED PAPERS Architecture and Application of Virtual Desk and 3D Process Simulation for Wire Rod Rolling Mills ..................................................................................................... 1271 A. Venuti Use of Offline Computational Tools for Plant Data Analysis and Setup Model Calibration: a Perspective in the Industry of Flat Metal Production .......................................................................................................................... 1276 C. Aurora, F.A. Cuzzola Architecture and Implementation of a MES System in a Large Scale Steel Plant: Severstal Cherepovets Success Story ................................................................... 1280 G. Brunetti PAPERS Anfis as a Method for Determinating MPPT in the Photovoltaic System Simulated in Matlab/Simunlink ....................................................................................... 1289 D. Mlakić, S. Nikolovski Linear Motion Calculation of the High Voltage Circuit Breaker Contacts Using Rotary Motion Measurement with Nonlinear Transfer Function ..................... 1294 K. Obarĉanin, R. Ostojić Robot Arm Teleoperation via RGBD Sensor Palm Tracking ....................................... 1300 F. Marić, I. Jurin, I. Marković, Z. Kalafatić, I. Petrović A Proposal for a Fully Distributed Flight Control System Design ............................... 1306 M. Šegvić, K. Krajĉek Nikolić, E. Ivanjko Control of Thermal Process with Simulink and NI USB-6211 in Real Time ........................................................................................................................... 1311 I. Tikvić, G. Vujisić, M. Fruk Stabilization of Multi-AUV Formation with Digital Control ........................................ 1315 S.A. Ul’yanov, N.N. Maksimkin XXIII A Hybrid Approach to Solve the Dynamic Patrol Routing Problem for Group of Underwater Robots ........................................................................................... 1321 M.Yu. Kenzin, I.V. Bychkov, N.N. Maksimkin Multi - Heater Induction Heating System with Sandwich Material Heater ................................................................................................................................. 1327 A. Smrke Two-Rate Motion Control of VTAV by NARMA-L2 Controller for Enhanced Situational Awareness ..................................................................................... 1333 I. Astrov LADDER Program Solution for Multi-probe Monitoring and Control in Simple Cooling Process .................................................................................. 1339 T. Špoljarić, M. Špoljarić An M2M Solution for Smart Metering in Electrical Power Systems ........................... 1348 M.P. Shopov Noise within a Data Center ............................................................................................... 1352 D. Miljković Active Noise Control: From Analog to Digital – Last 80 Years .................................... 1358 D. Miljković Responding to Stakeholders‟ Resistance to Change in Software Projects – A Literature Review .......................................................................................................... 1364 S.L.R. Vrhovec Object-Oriented Programming Model for Synthesis of Domain-Specific Application Development Environment .......................................................................... 1369 T. Lugarić, Z. Pavlić, D. Škvorc Logistic and Production Computer Systems in Small-Medium Enterprises ......................................................................................................................... 1375 M. Pighin The Implications of Employing Component Based Software Design in Non-Commercial Applications ......................................................................................... 1380 B. Zorić, G. Martinović, I. Crnković Extended Approach to Selecting a Project-specific Reliability Growth Model .................................................................................................................................. 1386 J. Krini, A. Krini, O. Krini, J. Börcsök Embedded Linux Controlled Sensor Network ............................................................... 1392 M. Saari, A.M. Baharudin, P. Sillberg, P. Rantanen, J. Soini XXIV Portable Sensor System for Reliable Condition Measurement ..................................... 1397 J. Soini, P. Sillberg, P. Rantanen, J. Nummela Architecture of an Interoperable IoT Platform Based on Microservices ..................................................................................................................... 1403 T. Vresk, I. Ĉavrak Performance Estimation in Heterogeneous MPSoC Based on Elementary Operation Cost .............................................................................................. 1409 N. Frid, D. Ivošević, V. Sruk Sustav za lociranje atmosferskih pražnjenja u identifikaciji kvarova TK mreže uzrokovanih atmosferskim prenaponima ............................................................. 1413 V. Milardić, B. Franc, M. Budimirović SNUPI - Sustav za nadzor i upravljanje procesima infrastrukture podatkovnog centra ........................................................................................................... 1420 M. Zmijanac INTELLIGENT SYSTEMS BiForD • SPECIAL SESSION ON BIOMETRICS & FORENSICS & DEIDENTIFICATION AND PRIVACY PROTECTION KEYNOTE SPEECH Face Alignment: Addressing Pose Variability in Face Recognition Systems ............................................................................................................................... 1433 V. Štruc PAPERS Shape and Texture Combined Face Recognition for Detection of Forged ID Documents ....................................................................................................... 1437 D. Sáez-Trigueros, H. Hertlein, L. Meng, M. Hartnett Simple Method Based on Complexity for Authorship Detection of Text ..................... 1443 L. Meluch, I. Tokárová, P. Farkaš, F. Schindler Privacy Protection Performance of De-identified Face Images with and without Background .......................................................................................................... 1448 Z. Sun, L. Meng, A. Ariyaeeinia, X. Duan, Z.-H. Tan Deep Metric Learning for Person Re-Identification and De-Identification ................................................................................................................ 1454 I. Filković, Z. Kalafatić, T. Hrkać XXV Deformable Part-Based Robust Face Detection under Occlusion by Using Face Decomposition into Face Components ......................................................... 1459 D. Marĉetić, S. Ribarić Creating a Face Database for Age Estimation and Classification ................................ 1465 P. Grd, M. Baĉa Forensic Anthropometry from Voice: An Articulatory-Phonetic Approach ............................................................................................................................ 1469 R. Singh, B. Raj, D. Gencaga INTELLIGENT SYSTEMS PAPERS Computer Vision for the Blind: a Dataset for Experiments on Face Detection and Recognition ................................................................................................ 1479 S. Carrato, S. Marsi, E. Medvet, F.A. Pellegrino, G. Ramponi, M. Vittori Impact of Light Conditions on the Vertical Traffic Signs Detection in Vertical Traffic Signs Recognition System ..................................................................... 1485 D. Solus, Ľ. Ovseník, J. Turán Wound Detection and Reconstruction Using RGB-D Camera ..................................... 1490 D. Filko, E.K. Nyarko, R. Cupec Clustering of Affective Dimensions in Pictures: An Exploratory Analysis of the NAPS Database ....................................................................................................... 1496 M. Horvat, K. Jednoróg, A. Marchewka Challenges in Adopting Big Data Strategies and Plans in Organizations ..................................................................................................................... 1502 A. Budin, S. Krajnović A Survey of Intelligent System Techniques for Indian Stock Market Forecasting ......................................................................................................................... 1508 S. Panwar, V.P. Upadhyay, S.K. Bishnoi The Effect of Class Distribution on Classification Algorithms in Credit Risk Assessment ................................................................................................................. 1514 K. Andrić, D. Kalpić Software Solution for Optimal Planning of Sales Persons Work Based on Depth-First Search and Breadth-First Search Algorithms ........................................... 1521 E. Ţunić, A. Djedović, B. Ţunić Iterated Local Search Algorithm for Planning the Sequence of Arrivals and Departures at Airport Runways ............................................................................... 1527 E. Bytyçi, K. Sylejmani, A. Dika XXVI Energy Efficiency with Intelligent Light Management Systems ................................... 1532 I. Britvić, A. Nikitović Adaptive and Modular Urban Smart Infrastructure .................................................... 1538 M. Klarić, I. Kuzle, I. Livaja Automatic Pathole and Speed Breaker Detection Using Android System ................................................................................................................................. 1543 V. Rishiwal, H. Khan The Influence of the CAPTCHA Types to Its Solving Times ........................................ 1547 D. Brodić, S. Petrovska, M. Jevtić, Z.N. Milivojević Techniques and Applications of Emotion Recognition in Speech ................................ 1551 S. Lugović, I. DunĊer, M. Horvat Word Occurrences and Emotions in Social Media: Case Study on a Twitter Corpus .................................................................................................................. 1557 I. DunĊer, M. Horvat, S. Lugović The Application of Parameterized Algorithms for Solving SAT to the Study of Several Discrete Models of Collective Behavior .............................................. 1561 S. Kochemazov, A. Semenov, O. Zaikin Logical-Algebraic Equations Application in Discrete-Event Systems Studying ............................................................................................................................. 1566 N. Nagul An Evaluation Framework and a Brief Survey of Decision Tree Tools .................................................................................................................................... 1572 N. Vlahović Positive Constructed Formulas Preprocessing for Automatic Deduction ........................................................................................................................... 1578 E. Cherkashin, A. Davydov, A. Larionov Monte-Carlo Randomized Algorithm: Empirical Analysis on Real-World Information Systems .................................................................................... 1582 R. Kudelić, D. Oreški, M. Konecki Control Flow Graph Visualization in Compiled Software Engineering ...................... 1586 A. Mikhailov, A. Hmelnov, E. Cherkashin, I.V. Bychkov Bottom-Left and Sequence Pair for Solving Packing Problems ................................... 1591 T. Rolich, D. Domović, M. Golub Automatic Image Annotation Refinement ...................................................................... 1597 M. Pobar, M. Ivašić-Kos XXVII Defining Ontology Combining Concepts of Massive Multi-Player Online Role Playing Games and Organization of Large-Scale Multi-Agent Systems ............................................................................................................................... 1603 B. Okreša Đurić, M. Schatten Comparison of Solution Representations for Scheduling in the Unrelated Machines Environment ..................................................................................................... 1609 M. Đurasević, D. Jakobović INFORMATION SYSTEMS SECURITY PAPERS TECHNICAL TRACK Technical Recommendations for Improving Security of Email Communications ................................................................................................................ 1623 A. Malatras, I. Coisel, I. Sanchez Performance Analysis of Two Open Source Intrusion Detection Systems ............................................................................................................................... 1629 B. Brumen, J. Legvart Challenges of Mobile Device Use in Healthcare ............................................................. 1635 S.L.R. Vrhovec Safe Use of Mobile Devices in the Cyberspace ............................................................... 1639 S.L.R. Vrhovec Securing Web Content and Services in Open Source Content Management Systems ........................................................................................................ 1644 H. Jerković, P. Vranešić, S. Dadić Can Malware Analysts be Assisted in Their Work Using Techniques from Machine Learning? .................................................................................................. 1650 I. Novković, S. Groš Performance Evaluation of a Rule-Based Access Control Framework ....................... 1656 S.A. Afonin SOCIAL ENGINEERING TRACK Going White Hat: Security Check by Hacking Employees Using Social Engineering Techniques ................................................................................................... 1663 Z. Lovrić Švehla, I. Sedinić, L. Pauk XXVIII Analysis of Phishing Attacks against Students ............................................................... 1667 J. Andrić, D. Oreški, T. Kišasondi What Do Students Do with Their Assigned Default Passwords? ................................. 1674 L. Bošnjak, B. Brumen Analysing Real Students‟ Passwords and Students‟ Passwords Characteristics Received From a Questionnaire ............................................................ 1680 V. Taneski, M. Heriĉko, B. Brumen MISC TRACK Using DEMF in Process of Collecting Volatile Digital Evidence .................................. 1689 M. Baĉa, J. Ćosić, P. Grd From Safe Harbour to European Data Protection Reform ........................................... 1694 T. Katulić, G. Vojković Information Security Assessment in Nature Parks ........................................................ 1699 S. Aksentijević, T. Đugum, K. Šakić Clustering Approach for User Location Data Privacy in Telecommunication Services ............................................................................................ 1706 M. Vuković, M. Kordić, D. Jevtić Analiza sigurnosnih ranjivosti inteligentnih sučelja za upravljanje podatkovnim centrom ....................................................................................................... 1711 M. Ramljak BUSINESS INTELLIGENCE SYSTEMS PAPERS Analyzing Air Pollution on the Urban Environment ..................................................... 1723 E. Baralis, T. Cerquitelli, S. Chiusano, P. Garza, M.R. Kavoosifar Application of Model Driven Architecture for Development of Data Consolidation Web-System ............................................................................................... 1729 A.A. Korobko, L.F. Nozhenkova Business Process Management Systems Selection Guiedelines: Theory and Practice ............................................................................................................................... 1735 V. Bosilj Vukšić, L. Brkić, M. Baranović Organization of Tax Data Warehouse for Legal Entities .............................................. 1741 M. Sretenović, B. Kovaĉić, V. Jovanović XXIX Predictive Analytics in Big Data Platforms – Comparison and Strategies ............................................................................................................................ 1747 M. Zekić-Sušac, A. Has The Analysis of CSFs in Stages of ERP Implementation - Case Study in Small and Medium - Sized (SME) Companies in Croatia ............................................. 1753 M. Nikitović, V. Strahonja Model optimizacije procesa s primjenom na punjenju bankomata .............................. 1759 I. Osman, K. Bokulić DIGITAL ECONOMY AND GOVERNMENT, LOCAL GOVERNMENT, PUBLIC SERVICES PAPERS The Modern Approach to the Analysis of Logistics Information Systems ............................................................................................................................... 1769 A. Iskra, E. Tijan, S. Aksentijević Development of the Data Warehouse Model for Public Authorities Accounts in Croatia ........................................................................................................... 1774 M. Sretenović, B. Kovaĉić, V. Jovanović The Future of Digital Economy in Some SEE Countries (Case study: Croatia, Macedonia, Montenegro, Serbia, Bosnia and Herzegovina) .......................... 1780 M. Vidas-Bubanja, I. Bubanja Effects and Evaluation of Open Government Data Initiative in Croatia ................................................................................................................................ 1786 T. Vraĉić, M. Varga, K. Ćurko ICT Technologies and Structured Dialogue: Experience of "Go, go, NGO!" Project .................................................................................................... 1792 N. Kadoić Using ICT Tools for Decision Making Support in Local Government Units .................................................................................................................................... 1798 N. Kadoić, I. Kedmenec The Conceptual Risk Management Model - A Case Study of Varazdin County ................................................................................................................................ 1804 R. Kelemen, M. Biškup, N. Begiĉević ReĊep Electronic Commerce in Croatia and a Comparison of Open Source Tools for the Development of Electronic Commerce ..................................................... 1811 J. Tomljanović, T. Turina, E. Krelja Kurelović XXX The Social Marketing as Prerequisite for the Competitiveness of South-East European Companies .................................................................................... 1817 I. Bubanja Homeostasis and Collaborative Decision Making for Smart and Cognitive Cities ................................................................................................................................... 1822 J. Klasinc Can the Bank Payment Obligation Replace the International Documentary Letter of Credit? ........................................................................................ 1828 R. Bergami Implementation and Design of Cool'n'Project - Web-Based Project Management Software ....................................................................................................... 1834 I. Špeh Analysis of ICT Use in Private Accommodation Rentals in Croatia ............................ 1841 Lj. Zekanović-Korona, J. Grzunov Records Management Challenges and Opportunities: An Australian Perspective ......................................................................................................................... 1847 A. Davies, R. Bergami Effectiveness Analysis of Using Solid State Disk Technology ....................................... 1852 A. Skendţić, B. Kovaĉić, E. Tijan Information and Communication Technologies and the New Forms of Organized Crime in Network Society ......................................................................... 1857 M. Boban Digitalizacija lokalne uprave na primjeru Istarske županije ........................................ 1863 L. Ordanić, N. Šarić-Kekić Digitalna ekonomija – rezultanta disruptivnih tehnologija .......................................... 1869 M. Mauher MIPRO Junior – STUDENT PAPERS PAPERS Technical Diagnosis of Basic Logic Gates ....................................................................... 1879 Z. Tucaković Developing a Parking Monitoring System Based on the Analysis of Images from an Outdoor Surveillance Camera .............................................................. 1884 I.V. Sukhinskiy, E.A. Nepovinnykh, G.I. Radchenko XXXI Laboratory Model of an Elevator: Control with Three Speed Profiles ................................................................................................................................ 1889 A. Jozić, T. Špoljarić, D. Gadţe Security and Privacy in an IT Context – a Low-Cost WIDS Employed against MITM Attacks (concept) ..................................................................................... 1895 N. Poljak, M. Ševo, I. Livaja Use of HLA During Customer Flow Simulation in a Polyclinic ................................... 1899 J. Brozek, J. Fikejz, V. Samotan, L. Gago Revealing the Structure of Domain Specific Tweets via Complex Networks Analysis ............................................................................................................. 1904 E. Moĉibob, S. Martinĉić-Ipšić, A. Meštrović Counting Prime Numbers in Paralell - Faster by Reducing the Synchronization Overhead ............................................................................................... 1909 A. Duraković, E. Pajić, I. Branković, E. Kušundţija, S. Karkelja Parallelization Challenges of BFS Traversal on Dense Graphs Using the CUDA Platform ................................................................................................ 1914 H. Milišić, D. Ahmić, H. Sinanović, E. Šarić, A. Asotić, A. Huseinović Buck Converter Controlled by Arduino Uno ................................................................. 1919 H. Kovaĉević, Ţ. Stojanović Audio Phonebook for the Blind People ........................................................................... 1924 G. Popović, U. Pale Heart Rate Variability Analysis Using Different Wavelet Transformations ................................................................................................................ 1930 U. Pale, F. Thürk, E. Kaniusas Istraživanje ransomware napada i prijedlozi za bolju zaštitu ....................................... 1936 M. Rak, M. Ţagar XXXII LIST OF PAPER REVIEWERS Aksentijević, S. Alexin, Z. Antolić, Ţ. Antonić, A. Aramo-Immonen, H. Arbula, D. Ašenbrener Katić, M. Avbelj, V. Babić, D. Babić, S. Bačmaga, J. Bako, N. Balaţ, A. Banek, M. Banek Zorica, M. Barić, A. Basch, D. Bebel, B. Begušić, D. Bellatreche, L. Bibuli, M. Bilas, V. Blaţević, D. Blaţević, Z. Blečić, R. Bogunović, N. Bonastre, J. Bosiljevac, M. Brčić, M. Bregar, K. Brestovec, B. Brezany, P. Britvić, I. Brkić, K. Brkić, L. Brkić, M. Broz, I. Budin, A. Budin, L. Bujan, I. Bujas, G. Buković, M. Butković, Ţ. Car, Ţ. Cifrek, M. Crnković Stumpf, B. Čačković, V. Čandrlić, S. Čeperić, V. Čičin-Šain, M. Čubrilo, M. Čupić, M. Davidović, M. Delač, G. Depolli, M. Dešić, S. Dobrijević, O. Domazet-Lošo, M. Duarte, M. (Croatia) (Hungary) (Croatia) (Croatia) (Finland) (Croatia) (Croatia) (Slovenia) (Croatia) (Croatia) (Croatia) (Croatia) (Serbia) (Croatia) (Croatia) (Croatia) (Croatia) (Croatia) (Croatia) (France) (Italy) (Croatia) (Croatia) (Croatia) (Croatia) (Croatia) (France) (Croatia) (Croatia) (Slovenia) (Croatia) (Austria) (Croatia) (Croatia) (Croatia) (Croatia) (Croatia) (Croatia) (Croatia) (Croatia) (Croatia) (Croatia) (Croatia) (Croatia) (Croatia) (Croatia) (Croatia) (Croatia) (Croatia) (Croatia) (Croatia) (Croatia) (Croatia) (Croatia) (Slovenia) (Croatia) (Croatia) (Croatia) (Portugal) Dţanko, M. Dţapo, H. Đerek, V. Erceg, I. Eškinja, Z. Fertalj, K. Filjar, R. Fischer, D. Frid, N. Galinac Grbac, T. Gamulin, O. Garza, P. Glavaš, G. Glavaš, J. Gojanović, D. Golfarelli, M. Golub, M. Golubić, S. Gomez Chavez, A. Gracin, D. Granić, A. Grd, P. Grgić, K. Grgurić, A. Groš, S. Grubišić, D. Grţinić, T. Gulić, M. Hadjina, T. Henno, J. Hoić-Boţić, N. Holenko Dlab, M. Horvat, G. Horvat, M. Hrabar, S. Hrkać, T. Humski, L. Hure, N. Ilić, Ţ. Inkret, R. Ipšić, I. Ivanjko, E. Ivašić-Kos, M. Ivković, N. Ivošević, D. Jaakkola, H. Jakobović, D. Jakopović, Ţ. Jakupović, A. Jardas, M. Jarm, T. Jelenković, L. Jevtić, D. Jeţić, G. Joler, M. Jovanovic, V. Jović, A. Kalafatić, Z. Kalpić, D. (Croatia) (Croatia) (Croatia) (Croatia) (Croatia) (Croatia) (Croatia) (Croatia) (Croatia) (Croatia) (Croatia) (Italy) (Croatia) (Croatia) (Croatia) (Italy) (Croatia) (Croatia) (Germany) (Croatia) (Croatia) (Croatia) (Croatia) (Sweden) (Croatia) (United States) (Croatia) (Croatia) (Croatia) (Estonia) (Croatia) (Croatia) (Croatia) (Croatia) (Croatia) (Croatia) (Croatia) (Croatia) (Croatia) (Croatia) (Croatia) (Croatia) (Croatia) (Croatia) (Croatia) (Finland) (Croatia) (Croatia) (Croatia) (Croatia) (Slovenia) (Croatia) (Croatia) (Croatia) (Croatia) (United States) (Croatia) (Croatia) (Croatia) XXXIII Kapus-Kolar, M. Karan, M. Kaštelan, I. Katanić, N. Kaučič, B. Keto, H. Kišasondi, T. Klemenc-Ketiš, Z. Kocev, D. Kocijan, K. Kopčak, G. Koričić, M. Kosec, G. Kovačić, A. Kovačić, B. Krašna, M. Krhen, M. Krivec, S. Krois, I. Krpić, Z. Kudelić, R. Kunda, I. Kušek, M. Lacković, I. Lipovac, A. Lo Presti , F. Lončarić, S. Lovrenčić, A. Lučev Vasić, Ţ. Lučić, D. Lugarić, T. Lukac, D. Ljubić, S. Maček, M. Magdalenić, I. Malarić, R. Mandić, F. Mandić, T. Maračić, M. Marčetić, D. Marinović, I. Marinović, M. Marjanović, M. Markuš, N. Martinčić-Ipšić, S. Matić, T. Mauša, G. Mekovec, R. Mekterović, I. Meng, L. Mezak, J. Mihajlović, Ţ. Mikac, B. Mikuc, M. Milanović, I. Miličević, K. Mišković, N. Mlinarić, H. Močinić, D. Modlic, B. Molnar, G. Mošmondor, M. Mrakovčić, T. XXXIV (Slovenia) (Croatia) (Serbia) (Croatia) (Slovenia) (Croatia) (Croatia) (Croatia) (Slovenia) (United States) (Sweden) (Croatia) (Slovenia) (Croatia) (Croatia) (Slovenia) (Croatia) (Croatia) (Croatia) (Croatia) (Croatia) (Croatia) (Croatia) (Croatia) (Croatia) (Italy) (Croatia) (Croatia) (Croatia) (Croatia) (Croatia) (Germany) (Croatia) (Slovenia) (Croatia) (Croatia) (Croatia) (Croatia) (Croatia) (Croatia) (Croatia) (Croatia) (Croatia) (Croatia) (Croatia) (Croatia) (Croatia) (Croatia) (Croatia) (United Kingdom) (Croatia) (Croatia) (Croatia) (Croatia) (Serbia) (Croatia) (Croatia) (Croatia) (Croatia) (Croatia) (Croatia) (Croatia) (Croatia) Mrković, B. NaĎ, Đ. Nikitović, M. Očko, M. Oletić, D. Orsag, M. Pale, P. Palestri, P. Paspallis, N. Pavlić, Z. Pečar-Ilić, J. Pelin, D. Perić Hadţić, A. Perkovac, M. Perković, T. Petrović, G. Pintar, D. Pivac, B. Pobar, M. Pocta, P. Poljak, M. Poplas Susič, T. Pribanić, T. Pripuţić, K. Ptiček, M. Rashkovska, A. Repnik, R. Resnik, D. Ribarić, S. Rimac-Drlje, S. Ristić, D. Rupčić, S. Seva, J. Sillberg, P. Skala, K. Skočir, P. Skorin-Kapov, L. Soini, J. Soler, J. Sorić, K. Sruk, V. Stanič, J. Stanič, U. Stapić, Z. Stojković, N. Stupar, I. Sučić, S. Suligoj, T. Suţnjević, M. Sviličić, B. Šarolić, A. Šegvić, S. Ševrović, M. Šikić, M. Šilić, M. Škvorc, D. Štajduhar, I. Štih, Ţ. Šunde, V. Švedek, T. Švogor, I. Tanković, N. Tijan, E. (Croatia) (Croatia) (Croatia) (Croatia) (Croatia) (Croatia) (Croatia) (Italy) (United Kingdom) (Croatia) (Croatia) (Croatia) (Croatia) (Croatia) (Croatia) (Croatia) (Croatia) (Croatia) (Croatia) (Slovakia) (Croatia) (Slovenia) (Croatia) (Croatia) (Croatia) (Slovenia) (Slovenia) (Slovenia) (Croatia) (Croatia) (Croatia) (Croatia) (United Kingdom) (Finland) (Croatia) (Slovenia) (Croatia) (Finland) (Denmark) (Croatia) (Croatia) (Slovenia) (Slovenia) (Croatia) (Croatia) (Croatia) (United States) (Croatia) (Croatia) (Croatia) (Croatia) (Croatia) (Croatia) (Croatia) (Croatia) (Croatia) (Croatia) (Croatia) (Croatia) (Croatia) (Croatia) (Croatia) (Croatia) Tomczak, J. Tomić, M. Tralić, D. Trancoso, I. Trobec, R. Trţec, K. Tuomi, P. Uroda, I. Varga, M. Vasić, D. Vidaček-Hainš, V. Vladimir, K. Vlahović, N. Vojković, G. Vrančić, K. Vranić, M. (Austria) (Croatia) (Croatia) (Portugal) (Slovenia) (Croatia) (Finland) (Croatia) (Croatia) (Croatia) (Croatia) (Croatia) (Croatia) (Croatia) (Croatia) (Croatia) Vrdoljak, B. Vrhovec, S. Vrlika, V. Vukadinović, D. Vuković, M. Weber, M. Werth, W. Zaluški, D. Zereik, E. Zinner, T. Ţonja, S. Zulim, I. Ţgank, A. Ţilak, J. Ţivković, M. Ţulj, S. (Croatia) (Slovenia) (Croatia) (Croatia) (Croatia) (Croatia) (Austria) (Croatia) (Italy) (Germany) (Croatia) (Croatia) (Slovenia) (Croatia) (Croatia) (Croatia) XXXV AUTHOR INDEX Aalto, S. Abazi Bexheti, L. Abbas, K. Abe, H. Accardo, A. Adamovic, N. Afgan, E. Afonin, S.A. Ahmić, D. Ahonen, K. Ajĉević, M. Aksentijević, S. Alagić, D. Alekseev, Yu.V. Alić, D. Ambroš, F. Amin, N. Andrić, J. Andrić, K. AnĊelić, N. AnĊelić, V. Antonijević, M. Apostolova Trpkovska, M. Aramo-Immonen, H. Arbanas, K. Arbet, D. Arbula, D. Ariyaeeinia, A. Arsenovski, S. Artemkina, S.B. Asiama Nyako, W. Asotić, A. Assefa, T.D. Astrov, I. Ašenbrener Katić, M. Aurora, C. Avbelj, V. Azemi, R. Babiĉ, M. Babić, A. Babić, M. Babić, S. Babić, S. Babić, T. Baĉa, M. Baĉić, I. Baĉmaga, J. Badnjević, A. Baggini, M. Baharudin, A.M. Bakalar, G. Balić, K. Balki, M. Balota, A. Bánáti, A. Bandiziol, A. Banović-Ćurguz, N. Baralis, E. Baranović, M. XXXVI 515 1145 1109 257 719 158 298 1656 1914 881 719 1699, 1769 224, 230 20 745 826 656 1667 1514 841 912 332 1145 1130 224 57 253 1448 388 16 707 1914 673 1333 917 1276 416 1149 367 1161 1239 1103 1165, 1258 1230, 1239, 1245 1465, 1689 970 78 395, 407, 477 836 1392 836 376 558 370 343 90 794, 800 1723 1735 Barić, A. Batbayar, K. Bedi, K. Begiĉević ReĊep, N. Belalem, G. Belošić, D. Benedetti, A. Benedyczak, K. Berdalović, I. Berdinsky, A.S. Bergami, R. Berković, M. Bianchi, F. Bishnoi, S.K. Biškup, M. Biuklija, I. Blašković, M. Blaţeka, K. Bleĉić, R. Boban, M. Bogut, R. Boj, N. Bokulić, K. Börcsök, J. Bosić, Z. Bosilj Vukšić, V. Bošković, D. Bošnjak, L. Bothe, K. Boţurić, M. Brandonisio, F. Branković, I. Bratina, T. Brenner, W. Brezany, P. Britvić, I. Brkić, K. Brkić, L. Brloţnik, M. Brodić, D. Brozek, J. Brumen, B. Brunetti, G. Bubanja, I. Bubaš, G. Budimac, Z. Budimirović, M. Budin, A. Budiša, D. Bulešić Milić, S. Bulović, A. Bychkov, I.V. Bytyçi, E. Bziuk, W. Cakaj, S. Capeska Bogatinoska, D. Carlini, G. Carrato, S. Celesti, A. 63, 78, 105, 110 269 1061 1804 214 679 5 190 40 11, 16 1828, 1847 703 509 1508 1804 1196 1225 1209 105 713, 1857 1250 1199 1759 1386 640 1735 521, 628 1674 766 1250 90 1909 1097 158 202 1532 1091 1735 416 1547 337, 1078, 1084, 1899 1629, 1674, 1680 1280 1780, 1817 1103 766 1413 1502 690 1171 472 1321, 1586 1527 684 554 707 719 1479 426 Cerquitelli, T. Cherkashin, E. Chiusano, S. Chiussi, S. Chungurski, S. Cifrek, M. Cmuk, D. Coisel, I. Cota, P. Craninckx, J. Crnković, I. Cupec, R. Cuzzola, F.A. Ĉaĉković, V. Ĉandrlić, S. Ĉapko, Z. Ĉavrak, I. Ĉiĉin-Šain, M. Ĉiţmar, R. Ĉošić, V. Ĉović, M. Ĉubrić, G. Ĉupić, M. Ćosić, J. Ćurko, K. D’Antrassi, P. Dadić, M. Dadić, S. Daković, M. Dancheva, T. Date, S. Davies, A. Davitashvili, T. Davydov, A. Depolli, M. Dević, I. Diduck, Q. Dika, A. Dika, F. Dizdarević, H. Dizdarević, S. Djebbar, E.I. Djedović, A. Djinevski, L. Domanjko-Petriĉ, A. Domović, D. Dovedan Han, Z. Draţić Lutilsky, I. Du, M. Duan, X. Dudarin, A. DunĊer, I. Duraković, A. Đambić, G. Đigaš, T. Đugum, T. Đurasević, M. Ebibi, M. Elsaadany, A. Enne, R. Erceg, I. Esztelecki, P. Farkaš, P. 1723 1578, 1586 1723 5 388 289, 326, 395, 407, 411, 446 852 1623 622 72 1380 1490 1276 584 917 1161 1403 1103, 1258 640 814 30 869 1091 1689 1786 719 146, 543 1644 569 208 257 1847 236 1578 401, 487 640 105 1527 185 599, 606 599, 606 214 1521 388 416 1591 1045 809 289, 432, 446 1448 100 1551, 1557 1909 442 826 1699 1609 1149, 1155 894, 1109 68 152 907 1443 Fazio, M. Fedorov, V.E. Feilhauer, T. Feoktistov, A.G. Fetaji, B. Fetaji, M. Fikejz, J. Filipović Tretinjak, M. Filko, D. Filković, I. Finny, S. Fischer, I.A. Fosić, I. Franc, B. Franković, I. Fratte-Sumper, S. Frid, N. Friganović, K. Fruk, M. Fulanović, B. Fumić, M. Gabela, A. Gadţe, D. Gago, L. Galin, M. Galinac Grbac, T. Gao, Y. Garaguly, Z. Garza, P. Gebrehiwot, M.E. Gencaga, D. Geraskin, A.S. Giedrimas, V. Girtelschmid, S. Gjorgjevska, E. Glamoĉanin, D. Godó, Z.A. Golub, I. Golub, M. Goonasekera, N. Gorachinova, L. Gorshkova, A.S. Gradišnik, V. Granić, A. Grd, P. Grgić, S. Grgurević, I. Grilec, F. Grollitsch, W. Groš, S. Grozev, D.G. Grubelnik, L. Grubelnik, V. Grubišić, D. Grzunov, J. Gulan, G. Gurbeta, L. Gusev, M. Guseva, A. Gütl, C. Hadţiahmetović, N. Hajdo, H. Hajnić, M. 426 11, 16 202 247 1149, 1155 1149, 1155 1899 912 1490 1454 126 5 690 1413 1115 110 1409 411 1311 1218 1225 777 1889 1899 640 634 289, 432, 446 269 1723 515 1469 95 745 263 976 755 1125 761 1591 298 388 20 30 946 1465, 1689 1230, 1245 736 446 90 1650 314 952 952 40 1841 380 395, 407, 477 208, 218, 420, 976 218 929 599, 606 831 852 XXXVII Hamernik, D. Hamidović, D. Han, M.K. Harris, J. Hartnett, M. Has, A. HeĊi, I. Henno, J. Herceg, M. Heriĉko, M. Hertlein, H. Hmelnov, A. Höfler, M. Hoić-Boţić, N. Holenko Dlab, M. Hormot, F. Horvat, G. Horvat, M. Hrkać, T. Huseinović, A. I. Orović, Ichikawa, K. Iković, O. Iliev, L.A. Iliev, O. Iliev, T.B. Ilišević, D. Ischebeck, A. Iskra, A. Ivanĉić, I. Ivanova, E.P. Ivanović, M. Ivanjko, E. Ivašić-Kos, M. Ivić, M. Ivković, N. Ivošević, D. Jaakkola, H. Jakes, M. Jakimoski, K. Jakobović, D. Janjić, M. Jazbinšek, V. Jednoróg, K. Jeriĉević, Ţ. Jerković, H. Jerman-Blaţiĉ, B. Jesenek, V. Jevtić, D. Jevtić, M. Jiang, Y. Job, J. Jovalekic, S. Jovanović, V. Jović, A. Jozić, A. Jozić, K. Jugo, I. Jukan, A. Jukić, O. Junaković, D. Jurin, I. Kacsuk, P. XXXVIII 1078 650 11 1028 1437 1747 805 881, 888, 1130 86 1680 1437 1586 929 1115 917 78 135 1496, 1551, 1557 1454 1914 574 257 521, 628 550, 562 388 550, 562 794, 800 929 1769 1072 550, 562 531, 766 1306 1597 809 992 1409 881, 888, 1130 337 388 1609 1051 467 1496 30, 274, 380 1017, 1033, 1644 367 1142 1706 1547 673 86 164 1028, 1741, 1774 326 1889 326 1056 684 805 1254 1300 343, 348 Kadoić, N. Kadriu, A. Kaevikj, N. Kail, E. Kakanakov, N.R. Kalafatić, Z. Kalpić, D. Kamcheva, E. Kaniški, M. Kaniusas, E. Karkelja, S. Kaštelan, I. Kathiravelu, P. Katulić, T. Kavoosifar, M.R. Kawai, E. Kedmenec, I. Kelemen, R. Kenzin, M.Yu. Kermek, D. Kertesz, A. Keserica, H. Keto, H. Khan, H. Kido, Y. Kim, S.J. Kiss, G. Kišasondi, T. Klarić, M. Klasinc, J. Klobuĉarević, D. Klobuĉarević, Ţ. Kljaić, Z. Kneţević, T. Knok, Ţ. Kobal, D. Kocev, D. Kochemazov, S. Kocijan, K. Kocsis, D. Kokanović, V. Kolman, M. Konecki, M. Kopecky, Z. Korać, M. Kordić, M. Koren, A. Koriĉić, M. Korobko, A.A. Kőrösi, G. Kos, L. Kosec, G. Kosovac, A. Kostadinovska, A. Kovács, J. Kovács, L. Kováĉ, M. Kovaĉević, H. Kovaĉević, Ţ. Kovaĉić, B. Kozlovszky, M. Koţar, I. Krajaĉić, S. 1792, 1798 1145 1137 348 314 1300, 1454 1514 388 934, 988, 992 1930 1909 923 634 1694 1723 257 1798 1804 1321 934 497 332 966 1543 257 11 1125 1667 1538 1822 679 679 656 34, 40 1002 614 481 293, 1561 1045, 1051 1125 442 308 988, 992, 1582 1078 1180 1706 141, 699 46, 51 1729 907 303 308, 450 703 1137 348 269 57 1919 783 1056, 1741, 1774, 1852 269, 343, 348 274, 380 1222 Krajĉek Nikolić, K. Krajnović, S. Kralevska, K. Kralj, L. Kranjac, M. Krašna, M. Krelja Kurelović, E. Krini, A. Krini, J. Krini, O. Kriţanović, K. Krvavica, A. Kuĉak, D. Kudelić, R. Kukolja, D. Kumar A., A. Kumar Singh, S. Kurbalija, V. Kušundţija, E. Kutaladze, N. Kuzle, I. Kuznetsov, V.A. Kvatadze, R. Laanpere, M. Lacković, A. Lamza Maronić, M. Larionov, A. Lasorsa, I. Lassila, P. Lavriĉ, P. Lazić, N. Ledneva, A.Yu. Legvart, J. Lenac, K. Librenjak, S. Likar, Š. Lin, C. Linna, P. Livaja, I. Lo Presti, F. Lonie, A. Lopina, V. Lovrenĉić, S. Lovrić Švehla, Z. Luĉev Vasić, Ţ. Ludescher, T. Lugarić, T. Lugović, S. Lukac, D. Lukovac, B. M. Marić, Maĉek, D. Magerl, M. Major, L. Mäkelä, J. Mäkinen, T. Maksimkin, N.N. Malarić, R. Malatras, A. Malekian, R. Marasović, K. Marceglia, S. Marchewka, A. 1306 1502 673 863, 1258 383 957 1811 1386 1386 1386 472 416 442 1582 326 120 684 531 1909 236 1538 11, 16 236 1115 1218 1072 1578 719 515 401 783 11, 16 1629 730 1051 416 432 966 1538, 1895 509 298 1045 988 1663 289, 446 202 1369 1551, 1557 1012 141 574 230 110 907 881, 888, 1130 966 1315, 1321 146, 543 1623 707 970 719 1496 Marĉec, M. Marĉetić, D. Marek, J. Marić, F. Marinĉić, A. Marinović, M. Marjanović, D. Marković, I. Markulic, N. Marsi, S. Martinĉić, K. Martinĉić-Ipšić, S. Martinović, G. Martinović, Ţ. Martinović, Ţ. Matić, T. Mauher, M. Medved Rogina, B. Medved, M. Medvet, E. Meluch, L. Meng, L. Meshchanov, V.P. Meštrović, A. Mihajlović, Ţ. Mihaljević, B. Mihaylov, Gr.Y. Mihova, M. Mijaĉ, M. Mikhailov, A. Mikuchadze, G. Mikuliĉić, N. Mikulić, J. Milardić, V. Milišić, H. Milivojević, Z.N. Miljković, D. Mirković, M. Miškić-Pletenac, N. Mišlov, R. Mitrović, N. Mlakić, D. Mochizuki, H. Moĉibob, E. Modebadze, Z. Mohorĉiĉ, M. Molnar, G. Morita, S. Mostarac, P. Moţnik, D. Nagul, N. Nagy, L. Nakrap, I.A. Námestovski, Z. Nardelli, M. Nenadić, I. Nepovinnykh, E.A. Ni, W. Nikitović, A. Nikitović, M. Nikolovski, S. Nonis, R. Novak, M. 1002 1459 1084 1300 141, 667 472 407, 395 1300 72 1479 82 1904 1380 146, 543 146, 543 86 1869 353 411 1479 1443 1437, 1448 95 1904 277, 1091 970 550, 562 1137 772 1586 236 277 63 1413 1914 1547 1352, 1358 1203 730 110 68 1289 46, 51 1904 236 487 100 46, 51 146 1184 1566 57 95 907 503 614 196, 1884 289, 446 1532 1753 1289 90 901, 934 XXXIX Novković, I. Nozhenkova, L.F. Nummela, J. Nyarko, E.K. Obarĉanin, K. Ogrin, A. Okanović, V. Okreša Đurić, B. Oliveira, F. Omanović, S. Opiła, J. Ordanić, L. Orehovaĉki, T. Orešĉanin, D. Oreški, D. Orović, I. Oršolić, P. Osman, I. Osreĉki, Ţ. Ostojić, R. Ovseník, Ľ. Paćelat, I. Paić, G. Pajić, E. Palata, D. Pale, U. Palestri, P. Paľová, D. Panwar, S. Paroški, M. Pata, K. Pauk, L. Pauković, G. Paviĉić, T. Pavić, T. Pavlić, Z. Pavlina, K. Pein, B. Pellegrino, F.A. Petrova-El Sayed, M. Petrović, I. Petrovska, S. Pflanzner, T. Pighin, M. Pirker, J. Pobar, M. Poljak, N. Poplas Susiĉ, T. Popović, B. Popović, G. Popović, Ţ. Posavec, N. Prazina, I. Prohaska, Z. Prohaska, Z. Pröll, B. Pršeš, K. Pun, S.H. Pürcher, P. Putnik, Z. R, R. Raczkowski, K. Radan, A. XL 1650 1729 1397 1490 1294 1239 376 1603 5 745 361 1863 982, 1165 590 1582, 1667 569 116 1759 40 1294 1485 1254 1039 1909 614 1924, 1930 90 1006 1508 383 1115 1663 640 1262 791 1369 1039 982 1479 190 1300 1547 497 1375 929 1597 1895 456 370 1924 584 24 376 996 996 263 376 289, 432 929 766 120, 126, 130 72 1033 Radchenko, G.I. Radojević, B. Radošević, I. Raj R.S., R. Raj, B. Rajković, E. Rak, M. Ramljak, D. Ramljak, M. Ramponi, G. Rantanen, P. Rashkovska, A. Rembold, D. Repnik, R. Retschitzegger, W. Révészová, L. Ribarić, S. Rishiwal, V. Risteska Stojkoska, B. Ristov, S. Ristovski, A. Rizvić, S. Rolich, T. Romanenko, A.I. Romano, A. Rostamzadeh Hajilari, H. Rumyantseva, V.D. Rutkowski, A. Ruţić, I. Saari, M. Sáez-Trigueros, D. Salapura, S. Salfinger, A. Salopek Ĉubrić, I. Salopek, D. Samkharadze, I. Samotan, V. Sanchez, I. Savin, A.N. Schatten, M. Schatzberger, G. Schindler, F. Schuller, B. Schulze, J. Schwinger, W. Sedinić, I. Semenov, A. Sertić, M. Sharifi, M. Shilov, I.P. Shimojo, S. Shopov, M.P. Sidorov, I.A. Sikimić, U. Sillberg, P. Simić, I. Sinanović, H. Singh, R. Sirotić, I. Skala, K. Skendţić, A. Slak, J. Slamić, G. 196, 1884 761 1190 130 1469 1245 1936 751 1711 1479 1392, 1397 481 164 962 263 940 1459 1543 724, 1137 208, 218, 420 218 376 1591 11, 16 426 283 20 190 1235 1392 1437 821, 847 263 869 826 236 1899 1623 95 1603 63 1443 190 5 263 1663 293, 1561 1174 283 20 257 314, 1348 242, 247 383 1392, 1397 383 1914 1469 152 175, 298 1852 450 1017 Slavuj, V. Smrke, A. Soini, J. Soliman, M. Solus, D. Soviĉ, M. Sretenović, M. Sruk, V. Stanĉić, A. Staniĉ, U. Stanković, I. Stanković, S. Stapić, Z. Stockreiter, C. Stojanović, A. Stojanović, Ţ. Stóka, G. Stopjaková, V. Stoyanov, I.S. Strahonja, V. Strmeĉki, D. Strujić, Dţ. Suĉić, S. Sukhinskiy, I.V. Suligoj, T. Suljanović, N. Sun, Z. Sušanj, D. Svoboda, V. Sylejmani, K. Šabec, J. Šakić, K. Šarić, E. Šarić-Kekić, N. Šegmanović, F. Šegvić, M. Šerifović-Trbalić, A. Ševo, M. Šikić, M. Šimunić, D. Šipuš, D. Šitin, I. Škoda, P. Škorić, I. Škorjanc, T. Škorput, P. Škrbić, M. Škvorc, D. Šneler, L. Šojat, Z. Šokac, D. Šolić, K. Šoša Anić, M. Šovĉík, M. Špeh, I. Špoljarić, M. Špoljarić, T. Štajcer, M. Štajcer, M. Štefan Trubić, M. Štimac, M. Štokić, N. Štruc, V. 1056 1327 1392, 1397 894 1485 962 1741, 1774 1409 736 456 569 574, 569 772 110 783 1919 1125 57 550, 562 772, 1753 934 370 332 1884 34, 40, 46, 51 650 1448 253, 380 337, 1084 1527 622 1699 1914 1863 40 1306 462 1895 472 141, 667, 699 579 946 353 982 841 656 599, 606 1369 86 175 1196 1174 791 57 1834 1339 1339, 1889 590 590 1190 826 644 1433 Šurić, J. Talebi, M.M. Tan, Z.-H. Taneski, V. Tasiĉ, J. Taşpinar, N. Taylor, J. Teder, S. Telenta, M. Temerinac, M. Tepeš, B. Thürk, F. Tijan, E. Tikvić, I. Tokárová, I. Tomes, L. Tomić, S. Tomljanović, J. Tonkovikj, P. Topolĉić, L. Tošić, M. Tošić, M. Trbalić, A. Trengoska, J. Tretinjak, M. Trivodaliev, K. Trobec, R. Tucaković, Z. Tudić, V. Turán, J. Turina, T. Tusun, S. Tutek, Ţ. Ul’yanov, S.A. Ungermanns, C. Upadhyay, V.P. Urem, F. Uroda, I. Vai, M.I. Varga, M. Vaser, R. Veiga, L. Veispahić, A. Vejaĉka, M. Venuti, A. Vidas-Bubanja, M. Villari, M. Vinko, D. Vittori, M. Vlahović, N. Vlaović, J. Vojković, G. Vraĉić, T. Vrana, R. Vranešić, P. Vrbovĉan, I. Vresk, T. Vrhovec, S.L.R. Vuĉić, M. Vujisić, G. Vukmirović, S. Vuković, M. Vunarić, I. 1262 283 1448 1680 420 558 298 1115 303 923 1039 1930 1769, 1852 1311 1443 929 383 1811 976 1222 521 628 462 707 1250 1137 481 1879 24 1485 1811 152 1215 1315 1121 1508 1222, 1225, 1254 996 289, 432 1786 472 634 703 1023 1271 1780 426 116, 135 1479 1572 135 1694 1786 875, 1067 1017, 1033, 1644 791 1403 436, 1364, 1635, 1639 100 1311 1161 1706 1230 XLI Vyroubal, V. Wambacq, P. Watashiba, Y. Weber, M. Weiss, B. Werth, W. Xhafa, V. Yakovleva, G.E. Yamanaka, H. Zaikin, O. Zdravevski, V. Zekanović-Korona, Lj. Zekić-Sušac, M. Zimmermann, H. XLII 736 72 257 713 110 1121 185 11 257 293, 1561 208 1841 1747 68 Zinterhof, P. Zmijanac, M. Zorić, B. Ţagar, M. Ţeţelj, B. Ţigulić, R. Ţilak, J. Ţilić, T. Ţitnik, T. Ţiţić, A. Ţlof, V. Ţunić, B. Ţunić, E. 320 1420 1380 1936 831 841 46, 51 814 640 946 821, 847 1521 1521 FOREWORD The 39th International ICT Convention MIPRO 2016 was held from 30th of May until 3rd of June 2016 in Opatija, the Adriatic Coast, Croatia. The Convention consisted of nine conferences under the titles: Microelectronics, Electronics and Electronic Technology (MEET), Distributed Computing, Visualization and Biomedical Engineering (DC VIS), Telecommunications & Information (CTI), Computers in Education (CE), Computers in Technical Systems (CTS), Intelligent Systems (CIS), Information Systems Security (ISS), Business Intelligence Systems (miproBIS), Digital Economy and Government, Local Government, Public Services (DE/GLGPS). A special conference was dedicated to the works of students: MIPRO Junior-Student Papers (SP). Along with this, special sessions on Biometrics & Forensics & De-Identification and Privacy Protection (BiForD) and Future Networks and Services (FNS) were also held as a part of convention MIPRO. The papers presented on these conferences and special sessions are contained in this comprehensive Book of Proceedings. All the papers were reviewed by an international review board. The list of reviewers is contained in the Book of Proceedings. All the positively reviewed papers are included in the Book of Proceedings. These papers were written by authors from the industry, scientific institutions, educational institutions, state and local administration. The convention was organized by the Croatian ICT Society MIPRO with the help of numerous patrons and sponsors to whom we owe our sincere thanks. We specially single out our golden sponsors Ericsson Nikola Tesla, T-Croatian Telecom and Končar-Electrical Industries and silver sponsor InfoDom. Our bronze sponsors are HEP–Croatian Electricity Company, Hewlett Packard, IN2, Transmitters and Communications and Storm Computers. To all who helped organizing the 39th International ICT Convention MIPRO 2016 as well as editing of this Book of Proceedings we extend our heartfelt thanks. Prof. Petar Biljanović, PhD International Program Committee General Chair XLIII MEET International Conference on MICROELECTRONICS, ELECTRONICS AND ELECTRONIC TECHNOLOGY Steering Committee Chairs: Željko Butković, University of Zagreb, Croatia Marko Koričić, University of Zagreb, Croatia Petar Biljanović, University of Zagreb, Croatia Members: Slavko Amon, University of Ljubljana, Slovenia Dubravko Babić, University of Zagreb, Croatia Maurizio Ferrari, CNR-IFN, Povo-Trento, Italy Mile Ivanda, Ruđer Bošković Institute, Zagreb, Croatia Branimir Pejčinović, Portland State University, USA Tomislav Suligoj, University of Zagreb, Croatia Aleksandar Szabo, IEEE Croatia Section INVITED PAPER (Si)GeSn Nanostructures for Optoelectronic Device Applications I. A. Fischer*, F. Oliveira*, **, A. Benedetti§, S. Chiussi §§ and J. Schulze* Institute of Semiconductor Engineering, Pfaffenwaldring 47, 70569 Stuttgart, Germany Centre of Physics, University of Minho, Campus de Gualtar, 4710-057 Braga, Portugal § CACTI, Univ. de Vigo, Campus Universitario Lagoas Marcosende 15, Vigo, Spain §§ Dpto. Fisica Aplicada, Univ. de Vigo, Rua Maxwell s/n, Campus Universitario Lagoas Marcosende, Vigo, Spain fischer@iht.uni-stuttgart.de * ** Abstract – We present an overview of recent results on the fabrication of GeSn- and SiGeSn-nanostructures for optoelectronic device application. I. INTRODUCTION Recent years have seen numerous experimental efforts directed at integrating photonics with electronics based on Group-IV optoelectronic devices. The use of Si and Ge in optoelectronics has been limited by their indirect bandgap resulting in low efficiency. More recently, significant progress has been made in enhancing optical properties of Group-IV alloys by including Sn. The unstrained binary alloy Ge1-ySny is predicted to become a direct bandgap semiconductor for y > 0.073 [1], while y > 0.17 is needed to obtain a direct bandgap material for Ge1-ySny grown pseudomorphically on Ge [2]. The growth of Ge1-ySny on Ge is challenging because of the 14.7 % lattice mismatch between α-Sn (with lattice constant aSn = 6,493 Å) and Ge (aGe = 5,658 Å). The existence of a direct bandgap material has been confirmed for a partially relaxed Ge0.874Sn0.126 layer on Ge [3]. Electrical and optical properties can be tuned further by adding Si (aSi = 5,431 Å) i. e. investigating the ternary alloy. SixGe1-x-ySny can be grown without lattice mismatch on Ge for  x a Sn  aGe y  aGe  a Si  Compared to Ge1-ySny the material properties such as the composition-dependent bandgap of the ternary alloy SixGe1-x-ySny are much less understood and subject of ongoing experimental efforts. Nanostructures such as quantum wells and islands are well suited for application in optoelectronic devices because of the improved optical properties that originate from carrier confinement in one or more directions. The properties of nanostructures such as quantum wells and quantum dots for application in III-V semiconductorbased optical devices have led to the fabrication and commercialization of devices such as quantum-well and quantum-dot lasers, quantum cascade lasers and quantumThis work was partly supported by the Portuguese Foundation for Science and Technology (FCT) through Strategic Project PEstC/FIS/UI0607/2013 and PhD Fellowship (F. Oliveira). MIPRO 2016/MEET well infrared photodetectors. SiGe nanostructures such as quantum wells [4] and islands [5] have also been intensively investigated for applications in modulators or photodetectors. The addition of Sn opens up exciting new possibilities for the use of Group-IV-nanostructures in optical device applications. There are a number of theoretical proposals for lasers and infrared photodetectors based on (Si)GeSn multi-quantum-well structures [6]–[8]. While nanostructures are interesting for optoelectronic applications in their own right, fabricating and functionalizing nanostructures in the context of group-IVoptoelectronics could be a possible route towards obtaining high Sn content in such structures, with the concomitant advantages for optical efficiency. Group-IV nanostructures containing Sn have previously been fabricated using different techniques. Sn dots were grown on Ge [9], Ge1-ySny dots were grown on thin SiO2 layers on top of Si (111) substrates [10] and Sn nanostructures were embedded into a Si or Ge matrix by annealing of SiSn or GeSn films [11]. Here, we present a brief overview of our recent progress in fabricating (Si)GeSn-nanostructures that can be fully embedded in Group-IV-based devices. II. NANOSTRUCTURE FABRICATION A. Molecular Beam Epitaxy Molecular Beam Epitaxy (MBE) is a growth method that is ideally suited to material deposition with monolayer precision. The growth of GeSn and SiGeSn alloys is usually performed at very low substrate temperatures in order to prevent precipitation and segregation of Sn. Deposition temperatures of 100 °C – 160 °C are often selected for GeSn, while SiGeSn has also been grown at higher temperatures [12]. In bulk materials, the substrate temperature has a strong impact on layer quality and, thus, on device quality. For nanostructures, material diffusion and segregation is an additional concern that needs to be addressed when selecting growth temperatures [12]. While a higher growth temperature could also yield better layer quality, it can be expected to negatively affect the apruptness of heterotransitions. All structures discussed here were grown using solid source MBE at a base pressure lower than 10-10 mbar. An electron beam evaporator was used for Si evaporation, 5 (a) layers on Ge. This virtual substrate was formed in two steps. A Ge buffer layer was grown at 1 Å/s and 330 ºC, followed by an annealing treatment at 850 ºC to reduce the threading dislocation density and form a virtual substrate. A second layer of epitaxial Ge was then deposited to provide a smooth surface onto which high-quality layers could be grown. (b) 100 nm n++-Si 100 nm n++-Ge Ge SiGeSn  SiGeSn SiGeSn Ge 100 nm p++-Ge(VS) 400 nm p++-Si Figure 1. (a) Schematic layer structure and (b) TEM image of SiGeSn multi-quantum-well structures. The contrast indicates material transitions. while Knudsen cells were used for Ge and Sn deposition. The Si as well as the Ge flux are monitored in situ with a quadruple mass spectrometer with a feedback loop for flux stabilization. The Sn flux is controlled by the cell temperature. The Si flux was calibrated by growing Si films on Si (100) wafers with a growth rate of 1 Å/s and measuring film thickness with a profilometer. The Ge flux was calibrated by growing a relaxed epitaxial Ge film on a Si (100) wafer with a slow flux of ≈ 0.1 Å/s and measuring film thickness by ellipsometric spectroscopy. The flux of Sn was calibrated by growing thin epitaxial films of Ge1-xSnx on Ge buffer layers on Si (100) wafers. The absolute concentration of Sn was subsequently measured using Rutherford backscattering spectroscopy. All samples discussed in the following subsections were fabricated on 4” Si (100) wafers. After placing the wafers into the MBE chamber they were first subjected to a thermal desorption step at 900 ºC for 5 minutes to remove the surface SiO2 layer. This was followed by the growth of 50 nm of Si to cover remaining surface contaminants and obtain a smooth surface for the subsequent growth steps. For (Si)GeSn layers grown on Ge it was necessary to form a Ge virtual substrate (VS) on the Si wafer to accommodate the lattice difference between Si and Ge and enable the growth of high quality (a) (b) B. SiGeSn Multi-Quantum Well Structures Fabricating SiGeSn multi-quantum-well (MQW) structures on Ge has the advantage that while both well and barrier layers can be strained with respect to Ge, the compositions of barrier and well layers can be chosen in such a way that the net strain of the MQW structure on Ge is zero or close to zero. Such structures can, thus, make use of well-established Ge VS technology for integration on Si. We, therefore, investigated SiGeSn MQW structures containing two and four wells grown on Ge buffer layers at a substrate temperature of 160 °C [13]. The presence of the well and barrier layers can clearly be seen in TEM images (Fig. 1 (b)). Both barrier and well layers, each 10 nm thick, are composed of SixGe1-x-ySny with different fractions of the semiconductor materials (Si0.31Ge0.62Sn0.07 for the barrier and Si0.25Ge0.63Sn0.12 for the well layers). The material compositions were chosen for their bandgaps and such that the lattice constant of the unstrained well (barrier) ternary alloy is larger (smaller) than that of Ge in order to obtain a MQW layer stack with small residual strain on the Ge VS. Optoelectronic device functionality has been demonstrated for these structures by placing them in the intrinsic layer of a PIN-photodiode layer stack. The presence of the SiGeSn-MQW structure can be seen to influence both photocurrent and electroluminescence measurements [13]. C. Sn-rich GeSn Multi-Quantum-Well-Structures in Ge Similarly to the growth of SiGeSn MQW structures, the fabrication of Ge/GeSn MQW structures can be achieved by sandwiching GeSn layers with thicknesses of a few nanometers between Ge spacer layers. Lightemitting diodes (LEDs) containing Ge/Ge0.93Sn0.07 MQW layers with varying thicknesses have been shown to emit light with much higher intensity than a reference Ge LED (c) Ge GeSn  Ge GeSn 200 nm Ge(VS) p-Si Substrate Figure 2. (a) Schematic layer structure and (b) TEM BF as well as (c) HR-TEM images of GeSn multi-quantum-well structures. The contrast indicates material transitions. 6 MIPRO 2016/MEET 2.0 ML (b) (1 x 1) μm2 (a) 3.0 ML 32 dots/μm2 1,214 dots/μm2 Figure 3. AFM measurements of (a) 2.0 and (b) 3.0 ML of Sn deposited on Ge (100). Large-scale dot formation is observed for 3.0 ML of Sn. [14]. A particularly high Sn content in Ge/GeSn MQW structures can be achieved by depositing pure Sn layers on Ge with a total layer thickness that is below the critical layer thickness tc at which the transition from 2D to 3D growth (Stranski-Krastanov growth) sets in. Similarly to the growth of Ge on Si, which has been investigated for nearly three decades, the growth of Sn on Ge proceeds in two stages. In the first stage, Sn forms a fully strained 2Dwetting layer on Ge. At a critical layer thickness, relaxation sets in via formation of local, Sn-rich material accumulations: 3D island growth is obtained. Using Sn layers with thicknesses below tc and overgrowing them with 5 – 10 nm of Ge, ultra-thin multi-quantum well layers with high Sn-content can be fabricated [15]. Fig. 2 shows cross sectional TEM images of a sample with 10 Sn-rich wells separated by 10 nm Ge spacer layers. The sample was fabricated by repeatedly depositing 2 ML of Sn and overgrowing them with 10 nm Ge at a constant substrate temperature of 100 °C. The resulting position-dependent Sn composition is likely to be influenced by material diffusion and Sn segregation but can be expected to be high. D. Sn-rich GeSn islands on Ge When the critical thickness is exceeded during the deposition of pure Sn on Ge, we can observe 3D island (a) (b) growth. We investigated the transition from 2D to 3D growth of pure Sn on Ge by depositing 0 – 5 ML of Sn on Ge at a substrate temperature of 100 °C [15]. AFM images of two selected samples are shown in Figure 3. At these growth temperatures, we determined the critical thickness to be tc = 2.25 ML. For Sn layers with thicknesses larger than tc we observe the onset of large-scale dot formation as shown in Fig. 3 (b). Only few dots can be found for Sn layers with thicknesses below tc as shown in Fig. 3 (a). No facets can be observed in AFM measurements. An interesting question is whether the Sn islands contain αSn, whose crystal structure is identical to that of Si and Ge, or β-Sn, which is the thermodynamically stable phase for temperatures above 13 °C. The fact that such layers overgrown with Ge show perfect crystallinity in TEM measurements seems to indicate that the islands observed in AFM measurements are indeed composed of α-Sn [15]. E. GeSn islands in Si Finally, a strategy of producing GeSn islands on Si that closely mirrors the growth of self-assembled Ge islands on Si consists of growing few MLs of Ge1-ySny on Si. Self-assembled GeSn islands were fabricated by depositing 5.5 ML of Ge0.96Sn0.04 on Si at a constant substrate temperature of 350 °C and overgrowing them with 10 nm of Si [16]. This growth sequence could be repeated up to four times until layer quality started to deteriorate [16]. TEM images of a sample with three stacked layers of self-assembled GeSn islands are shown in Fig. 4. At a thickness of 5.5 ML the deposited Ge0.96Sn0.04 layers exceed the critical thickness for island formation and relaxation sets in via the local accumulation of GeSn. The resulting structures are clearly visible in the cross sectional TEM images. Local material accumulation can be seen to be accompanied by a thinning of the wetting layer in the vicinity of the islands. The resulting island composition can, thus, be expected to be a result of position-dependent intermixing as is the case with Ge layers deposited on Si. An experimental technique with sub-nanometre precision would be required to investigate the position-dependent composition of the resulting islands in detail. III. CONLUSION We have explored several growth strategies for (c) Si GeSn  Si GeSn Si Substrate Figure 4. (a) Schematic layer structure and (b) STEM as well as (c) HR-TEM images of GeSn dots grown on Si and capped with Si. Local GeSnrich islands form as a result of the lattice mismatch between Ge0.96Sn0.04 and Si. MIPRO 2016/MEET 7 growing Sn-rich quantum wells and islands that can be fully embedded either in Si or Ge for future possible applications in electrooptic devices. We are able to fabricate Sn-rich nanostructures, some of which have already been integrated into diodes to demonstrate optoelectronic functionality. Future steps consist of improving layer growth and, most importantly, explore experimental strategies to determine composition and strain on a nanometer and even sub-nanometer scale in order to be able to tailor those nanostructures for application needs. [6] [7] [8] [9] [10] ACKNOWLEDGMENT We thank G. Capellini, M. Virgilio, T. Wendav and K. Busch for insightful discussions. [11] REFERENCES [1] [2] [3] [4] [5] 8 L. Jiang, J. D. Gallagher, C. L. Senaratne, T. Aoki, J. Mathews, J. Kouvetakis, and J. Menéndez, “Compositional dependence of the direct and indirect band gaps in Ge1-ySny alloys from room temperature photoluminescence,” Semicond. Sci. Technol., vol. 29, p. 115028, 2014. A. A. Tonkikh, C. Eisenschmidt, V. G. Talalaev, N. D. Zakharov, J. Schilling, G. Schmidt, and P. Werner, “Pseudomorphic GeSn/Ge(001) quantum wells: Examining indirect band gap bowing,” Appl. Phys. Lett., vol. 103, p. 32106, Jul. 2013. S. Wirths, R. Geiger, N. von den Driesch, G. Mussler, T. Stoica, S. Mantl, Z. Ikonic, M. Luysberg, S. Chiussi, J. M. Hartmann, H. Sigg, J. Faist, D. Buca, and D. Grützmacher, “Lasing in directbandgap GeSn alloy grown on Si,” Nat. Photonics, vol. 9, pp. 88– 92, Feb. 2015. Y.-H. Kuo, Y. K. Lee, Y. Ge, S. Ren, J. E. Roth, T. I. Kamins, D. A. B. Miller, and J. S. Harris, “Quantum-Confined Stark Effect in Ge/SiGe Quantum Wells on Si for Optical Modulators,” IEEE J. Sel. Top. Quantum Electron., vol. 12, pp. 1503–1513, Dec. 2006. K. L. Wang, Dongho Cha, Jianlin Liu, and C. Chen, “Ge/Si SelfAssembled Quantum Dots and Their Optoelectronic Device Applications,” Proc. IEEE, vol. 95, pp. 1866–1883, Sep. 2007. [12] [13] [14] [15] [16] G.-E. Chang, S.-W. Chang, and S.-L. Chuang, “Strain-Balanced Multiple-Quantum-Well Lasers,” IEEE J. Quantum Electron., vol. 46, pp. 1813–1820, 2010. G. Sun, R. A. Soref, and H. H. Cheng, “Design of a Si-based lattice-matched room-temperature GeSn/GeSiSn multi-quantumwell mid-infrared laser diode,” Opt. Express, vol. 18, pp. 19957– 19965, Sep. 2010. G. Sun, R. A. Soref, and H. H. Cheng, “Design of an electrically pumped SiGeSn/GeSn/SiGeSn double-heterostructure midinfrared laser,” J. Appl. Phys., vol. 108, p. 33107, Aug. 2010. W. Dondl, P. Schittenhelm, and G. Abstreiter, “Self-assembled growth of Sn on Ge (001),” Thin Solid Films, vol. 294, pp. 308– 310, Feb. 1997. Y. Nakamura, A. Masada, S.-P. Cho, N. Tanaka, and M. Ichikawa, “Epitaxial growth of ultrahigh density Ge1−xSnx quantum dots on Si (111) substrates by codeposition of Ge and Sn on ultrathin SiO2 films,” J. Appl. Phys., vol. 102, pp. 124302124302–6, Dec. 2007. R. Ragan, K. S. Min, and H. A. Atwater, “Direct energy gap group IV semiconductor alloys and quantum dot arrays in SnxGe1−x/Ge and SnxSi1−x /Si alloy systems,” Mater. Sci. Eng. B, vol. 87, pp. 204–213, Dec. 2001. N. Taoka, T. Asano, T. Yamaha, T. Terashima, O. Nakatsuka, I. Costina, P. Zaumseil, G. Capellini, S. Zaima, and T. Schroeder, “Non-uniform depth distributions of Sn concentration induced by Sn migration and desorption during GeSnSi layer formation,” Appl. Phys. Lett., vol. 106, p. 61107, Feb. 2015. I. A. Fischer, T. Wendav, L. Augel, S. Jitpakdeebodin, F. Oliveira, A. Benedetti, S. Stefanov, S. Chiussi, G. Capellini, K. Busch, and J. Schulze, “Growth and characterization of SiGeSn quantum well photodiodes,” Opt. Express, vol. 23, p. 25048, Sep. 2015. B. Schwartz, M. Oehme, K. Kostecki, D. Widmann, M. Gollhofer, R. Koerner, S. Bechler, I. A. Fischer, T. Wendav, E. Kasper, J. Schulze, and M. Kittler, “Electroluminescence of GeSn/Ge MQW LEDs on Si substrate,” Opt. Lett., vol. 40, p. 3209, Jul. 2015. F. Oliveira, I. A. Fischer, A. Benedetti, P. Zaumseil, M. F. Cerqueira, M. I. Vasilevskiy, S. Stefanov, S. Chiussi, and J. Schulze, “Fabrication of GeSn-multiple quantum wells by overgrowth of Sn on Ge by using molecular beam epitaxy,” Appl. Phys. Lett., vol. 107, p. 262102, Dec. 2015. F. Oliveira, I. A. Fischer, A. Benedetti, M. F. Cerqueira, M. I. Vasilevskiy, S. Stefanov, S. Chiussi, and J. Schulze, “Multi-stacks of epitaxial GeSn self-assembled dots in Si: Structural analysis,” J. Appl. Phys., vol. 117, p. 125706, Mar. 2015. MIPRO 2016/MEET PAPERS Thermoelectric Properties of Polycrystalline WS2 and Solid Solutions of WS2-ySey Types G.E. Yakovleva*, A.I. Romanenko*, A.S. Berdinsky **, A.Yu. Ledneva*, V.A. Kuznetsov**, M.K. Han***, S.J. Kim*** and V.E. Fedorov* ** *Nikolaev Institute of Inorganic Chemistry, Russian Academy of Sciences, Novosibirsk, Russia Novosibirsk State Technical University/Semiconductor Devices & Microelectronics, Novosibirsk, Russia *** Ewha Womans University/Dept. Chemistry and Nano Science, Seoul, Korea e-mail: fed@niic.nsc.ru Transition metal chalcogenides are perspective thermoelectric materials which have a great interest for application. In this work, the polycrystalline bulk WS2 and solid solutions WS2-ySey types have been studied. In contrast to the literature data obtained at higher temperatures, we have investigated the thermoelectric properties of these materials at low and middle temperatures (77-450K). The temperature dependences of electrical conductivity and Seebeck coefficient were received from experimental data. The Seebeck coefficients of these materials have a high values, the maximum value up to 2000µV/K has been obtained. I. INTRODUCTION At our days, the main ecological problem is an environmental pollution. Over 60% of energy is wasting worldwide, mostly in the form of waste heat [1]. Therefore, thermoelectric power sources are one of the perspective fields of study. Thermoelectric materials have an ability to convert heat into electrical energy. The efficiency of thermoelectric materials is characterized by dimensionless thermoelectric quality factor ZT. This parameter depends on electrical conductivity (σ), Seebeck coefficient (S) and thermal conductivity (λ). The layered transition metals chalcogenides are typical 2-D solid materials in a bulk form. Such materials have been used as solid lubricant, photovoltaic and photocatalytic solar energy converters, catalysts in many other industrial applications and others for many years [2]. Nevertheless, thermoelectric properties of layered transition metals chalcogenides are of great interest due to their low thermal conductivity. metals such as WS2 and WSe2 where authors mainly studied single crystals of chalcogenides of transition metals [3]. In paper [4] authors investigated transport properties of ternary mixed WS2-ySey single crystals. In our work we have researched thermoelectric properties of polycrystalline solid solutions of WS2-ySey and W1xNbxS2 for thermoelectric applications. II. EXPERIMENTAL RESULTS AND DISCUSION A. Synthesis and characterization of the compositions A series of samples of compositions WS2-ySey (y = 0.1, 0.2, 0.25) and W1-xNbxS2 ( x = 0.05 and 0.15) were synthesized by means of ampoule high-temperature method. High purity elements were used for the syntheses. The starting powders of metals were annealed in hydrogen flow at 1000°C for 1 hour in order to remove adsorbed water and traces of oxides. Stoichiometric amounts of metal powder and chalcogenes were placed in quartz ampoules. The ampoules were vacuumed and sealed, heated up to 800°C during 5 hours and kept at 800°C for 4 days, then cooled and opened. According XRD analysis the samples were single phases corresponded to 2H-WS2 type (hexagonal, P63/mmc) with some broadening of reflections in comparison to pure WS2(Fig.1). One of the brightest representatives of such materials is a tungsten disulfide. Single-layer tungsten disulfide is a two-dimensional quasi-crystal, that consists of close-packed layer of tungsten in between of two close-packed layers of sulfur S-W-S. Layers are held together due to weak Van der Waals forces, whereas atomic structures of layers are tied together by strong covalent forces. There are some papers devoted to the study of chalcogenides thermoelectric properties of transition Figure 1. XRD powder patterns of WS1.90Se0.10, WS1.80Se0.20, WS1.75Se0.25 The work was supported by Russian Science Foundation, grant 1413-00674. MIPRO 2016/MEET 11 The crystal system is hexagonal, space group is P63/mmc (no. 194). In chemistry of dichalcogenides this symmetry also called 2H-type of WQ2 (or MoQ2, where Q = S, Se, Te). There are also exist 3R-type corresponding to rhombohedral R3m (no. 160) space group. All obtained compound are 2H-type which means they have P63/mmc space group. If our samples were single-crystal, the diffraction reflexes would have been tight. Since the samples are fine powders, the broadening of reflexes occurs. The doping of Nb or Se leads to insignificant changes in unit cell parameters and small shift (about 0.1 degree) of reflexes in comparison with pure WS2. Combination of these two factors leads to a broadening of the reflections compared to pure WS2. B. Preparation of the samples and measurement techique Before measurements all samples have been kept under vacuum during 2 hours at 300° in order to remove absorbed water and oxygen from air. The XRD powder patterns of annealed samples completely agree with ones before evacuating. Thermal study of the samples shows stability of compounds up to 400°C (673 K). The powdered materials were pressed to 10 mm in diameter pellets-shape samples. The samples 10 × 2 ×2 mm in size were cut from the pellets. Silver paste was used in order to obtain ohmic contact with samples. The measurements of the electrical conductivity temperature dependence were performed by means of four-contact method. The thermopower was measured by means of static dc method. Thermal conductivity was determined by combining the thermal diffusivity D(T), specific heat Cp (T) and sample density ρ (T) according to κtot (T) = D(T) × Cp(T) × ρ(T). The thermal diffusivity D(T) and specific heat Cp(T) of several specimens were determined by the flash diffusivity-heat capacity method using NETZSCH LFA 457 MicroFlash™ instrument. The temperature ranges for temperature dependence of Seebeck coefficient and electrical conductivity are different due to technical capability of measuring instruments. current–voltage characteristics, magnetoresistance, etc.) of materials substantially vary with their decreasing to nanometer sizes [5-10]. Our investigations of the electrical properties of materials containing nanoparticles with characteristic sizes of the order of several nanometers in different dielectric matrices manifested the variation of not only electrical conductivity but electron transport mechanisms as well [11-14]. It was established by the example of numerous systems that conductivity in polycrystalline materials with a high contact resistance is performed by tunneling charge carriers between the crystallites separated by conducting barriers (contacts between the crystallites) [15, 16]. If the sizes of crystalline islands are large sufficiently, the temperature dependence of conductivity σ(T) is described by the fluctuation model of tunneling – the fluctuation induced tunneling conduction (FIT) [17]:  σ(T) = σ1∙exp[-Tt/(T+Ts  where temperature Tt corresponds to the energy necessary for the electron transition between the crystallites (in fact, this transition is associated with overcoming energy gap Eg ~ kB∙Tt); σ1 is the intrinsic conductivity and Ts is the temperature, below which the conductivity reaches the saturation [17]. Electrical conductivity of the samples was measured in helium atmosphere in a temperature range from 77 K to 450 K. The measured results are displayed in Fig.2. The straight lines are approximations of the low temperature data by “(1)” with Ts=5K for all samples. The results of temperature dependences of conductivity demonstrate that the FIT mechanism makes the dominant contribution to the low temperature dependence of conductivity for all samples. This fact leads us to conclusion that contact resistance between particles makes the main contribution to resistivity of all samples. On the figure 3 the energy gap (Eg ~ kB∙Tt) of the WS2-ySey are presented. C. Electrical properties Our bulk samples consist of big quantity of the pressed nanoparticles. The current flow in such bulk sample will perform both inside nanoparticles and through the contacts between them. This leads to the additional variation in the electron transport properties of massive samples that consist of numerous nanoparticles. In addition, a poorly conducting layer is formed on the surface of the most of nanodimensional electroconductive particles. In many cases, the electrical conductivity in arrays of such nanoparticles is determined mainly by the contact resistance. It is established experimentally that the electron transport properties (electrical conductivity, 12 Figure 2. Temperature dependence of conductivity of the WS2-ySey samples in coordinates of dependence “(1)”. MIPRO 2016/MEET transition metal atoms in WS2 by Nb are presented in Fig.5. Such partial substitution of metal atoms in WS2 leads to decrease in its energy gap and change its conductivity behavior to metallic one. As seen from Figure 5, the material with composition W0.85Nb0.15S2 has value of electrical conductivity greater than the one of the material with replacement WS2-ySey approximately by 103 times at T=320K. The materials W0.85Nb0.15S2 and W0.95Nb0.05S2 have energy gap value of 0.0001 eV and 0.02 eV correspondingly. Figure 3. Material composition dependence of the WS2-ySey energy gap D. Thermoelectric properties Seebeck coefficient of the samples was measured at the temperature range of 190 to 450 K. The results on Seebeck coefficient are presented in Fig.4. Figure 5. Temperature dependence of electrical conductivity of the W1-xNbxS2 samples. E. Power factor In present work we researched temperature dependence of Seebeck coefficient and electrical conductivity and we used power factor for the characterization of thermoelectric efficiency. Figure 4. Temperature dependence of Seebeck coefficient of the WS2-ySey samples. The inset shows the results of measurements at low temperatures. According to obtained data the maximum value of Seebeck coefficient S = 2000 V/K has material with composition WS1.8Se0.2. And one has maximum value at low (S=-525µV/K at T=205 K) and middle temperatures (S = 1950 µV/K at T=445 K). Such replacement of chalogen atom leads to increase in Seebeck coefficient. According to our preliminary measurements, the thermal conductivity of WS2-ySey is about 1-1.2 W/m*K. Such materials have a large value of the Seebeck coefficient on the one hand, and very low value of the electrical conductivity on the other hand. Therefore, in order to increase the electrical conductivity we have replaced transition metal atoms W partially by atoms Nb. As we have shown in paper [18] such replacements significantly increased the electrical conductivity. The results of our work on measurements of replaced MIPRO 2016/MEET To evaluate the effectiveness of thermoelectric materials power factor was calculated by the formula “(2)”.  P = S2σ  where S – Seebeck coefficient (V/K), σ – electrical conductivity (S/cm). The results of the thermoelectric power factor calculation are presented in Fig. 6. One can see that these materials have not very high value of power factor in comparison with modern thermoelectric materials due to low electrical conductivity. But polycrystalline solid solutions of WS2-ySey have a very high Seebeck coefficient. These parameters are interdependent quantities [19] so a compromise between electrical conductivity and Seebeck coefficient needs to be found. We believe that replacement of metal atoms W in WS2 partially by atoms Nb should be an efficient way of tuning thermoelectric properties towards the system's optimum. 13 [2] [3] [4] [5] [6] Figure 6. Temperature dependence of power factor of the WS2-ySey samples. The inset shows the results of measurements at low temperatures. III. [7] CONLUSION A series of samples of compositions WS2-ySey (y = 0.1, 0.2, 0.25) and W1-xNbxS2 ( x = 0.05 and 0.15) were synthesized by means of ampoule high-temperature method. Electron transport properties of obtained nanocomposite bulk materials WS2 and solid solutions W1-xNbxS2, WS2-ySey at temperature range from 77 K to 450K were investigated. We have found that Seebeck coefficients of these materials have a high value. The maximum value of the Seebeck coefficient up to 2000µV/K has been obtained for composition WS1.8Se0.2. However, the power factor P was found to be low. We have found that in nanocomposite bulk materials W0.85Nb0.15S2 electrical conductivity increased by 103 times at room temperature, but the Seebeck coefficient was lower than in any of WS2-ySey . It was shown that it is possible to change thermoelectric and transport properties by controlling of Se and Nb contents in composites WS2-ySey and W1xNbxS2. In the scope of present work we succeed in increasing of the Seebeck coefficient by replacing chalcogen atoms in WS2 by the factor of 2. But despite such a significant change of the Seebeck coefficient, these bulk polycrystalline composite materials still cannot compete with modern thermoelectric materials such as Bi2Te3 thin films with P= 2000 µW/m*K at room temperature [20]. [8] [9] [10] [11] [12] [13] [14] [15] [16] ACKNOWLEDGMENT The work was supported by Russian Science Foundation, grant 14-13-00674. References [1] 14 M. G. Kanatzidis, “Nanostructured thermoelectrics: the new paradigm?,” Chemistry of materials, vol. 22(3), pp. 648–659, 2010. [17] [18] Haotian Wang, Hongtao Yuan, Seung Sae Hong, Yanbin Li and Yi Cui, “Physical and chemical tuning of two-dimensional transition metal dichalcogenides,” Royal society of chemistry, vol. 44, pp. 2664–2680, 2015. Jong-Young Kim, Soon Mok Choi, Won-Seon Seo and WooSeok Cho, “ Thermal and electronic properties of exfoliated metal chalcogenides,” Korean Chem.Soc, vol.3, pp.3225-3227, 2010. G. K. Solanki, D.N. Gujarathi, M. P. Lakshminarayana and M. K. Agarwal, “Transport property measurement in tungsten sulphoselenide single crystals grown by a CVT technique,” Crystal research and technology, vol. 43, pp. 179–185, 2008. Zaitsev-Zotov S.V., Pokrovskii V. Y., Monso P. “ Transition to 1D conduction with decreasing thickness of the crystals of TaS3 and NbSe3 quasi-1D conductors,” JETP Lett., vol. 73, pp. 29-32, 2001. Hor Y. S., Xiao Z. L., Welp U., Ito Y., Mitchell J. F., Cook R. E., et al. “Nanowires and Nanoribbons of Charge-DensityWave Conductor NbSe3,” Nano Letters, vol.5,pp. 397-401, 2005. Stabile A. A., Whittaker L, Wu T. L., Marley P. M., Banerjee S, Sambandamurthy G. “Synthesis, characterization, and finite size effects on electrical transport of nanoribbons of the charge density wave conductor NbSe3,”Nanotechnology, vol.22, pp. 1 -12, 2011. Monceau P. “Electronic crystals: an experimental overview,” Advances in Physics, vol.61, pp. 325-581, 2012. Romanenko A.I., Anikeeva O.B., Kuznetsov V.L., Obrastsov A.N., Volkov A. P., Garshev A. V. “Quasi-two-dimensional conductivity and magnetoconductivity of graphite-like nanosize crystallites,” Solid State Communications, vol.137, pp. 625629, 2006. Romanenko A. I., Anikeeva O. B., Buryakov T. I., Tkachev E. N., Zhdanov K. R., Kuznetsov V. L., et al. “Electrophysical properties of multiwalled carbon nanotubes with various diameters,” Phys Status Solidi B,vol.246, pp. 2641-2644, 2009. Chen J, Zhang G, Li B. “Impacts of Atomistic Coating on Thermal Conductivity of Germanium Nanowires,” Nano Lett, vol. 12, pp. 2826-2832, 2012. Romanenko A. I., Anikeeva O. B., Buryakov T. I., Tkachev E. N., Zhdanov K. R., Kuznetsov V. L., et al. “Influence of surface layer conditions of multiwall carbon nanotubes on their electrophysical properties,” Diamond & Related Materials, vol.19, pp. 964-967, 2010. Mazov I. N., Kuznetsov V. L., Moseenkov S. I., Ishchenko A. V., Rudina N. A., Romanenko A. I., et al. “Structure and Electrophysical Properties of Multiwalled Carbon Nanotube/Polymethylmethacrylate Composites Prepared via Coagulation Technique,” Nanoscience and Nanotechnology Letters, vol.3, pp. 18-23, 2011. Romanenko A. I., Fedorov V. E., Artemkina S. B., Anikeeva O. B., Poltarak P. A. “Temperature Dependences of Transport Properties of Films, Bulk Samples of Nanocrystals, and Single Crystals of Niobium Triselenide,” Physics of the Solid State. vol.57, pp. 1850-1854, 2015. Zhao Y, Li W. “Fluctuation-induced tunneling dominated electrical transport in multi-layered single-walled carbon nanotube films,” Thin Solid Films, vol.519, pp. 7987-7991, 2011. Romanenko A. I., Dybtsev D. N., Fedin V. P., Aliev S. B., Limaev K. M. “Electric-Field-Induced Metastable State of Electrical Conductivity in Polyaniline Nanoparticles Polymerized in Nanopores of a MIL-101 Dielectric Matrix,” JETP Letters, vol. 101, pp. 59-63, 2015. Sheng P. “Fluctuation-Induced Tunneling Conduction in Disordered Materials,” Physical Review B, vol.21, pp. 21802195, 1980. V.E. Fedorov, N.G. Naumov, A.N. Lavrov, M.S. Tarasenko, S.B. Artemkina, A.I. Romanenko and M.V. Medvedev, “Tuning Electronic Properties of Molybdenum Disulfide by a Substitution in Metal Sublattice,” 6th International Convention on Information & Communication Technology Electronics & Microelectronics,pp.11-14, 2013. MIPRO 2016/MEET [19] Miroslav Ocko, Sanja Zonja and Mile Ivanda “Thermoelectric materials: problems and perspectives,”MIPRO 2010, pp.16-21, 2010. [20] Jae-Hwan Kim, Jung-Yeol Choi, Jae-Man Bae, Min-Young Kim and Tae-Sung Oh “Thermoelectric characteristics of ntype Bi2Te3 and p-type Sb2Te3 thin films prepared by coevaporation and annealing for thermopile sensor application,” Materials Transactions,vol.54, pp. 618-625, 2013 MIPRO 2016/MEET 15 Piezoresistive Effect in Polycrystalline Bulk and Film Layered Sulphide W0.95Re0.05S2 V. A. Kuznetsov*, **, a, A. I. Romanenko*, A. S. Berdinsky**, A. Yu. Ledneva**, S. B. Artemkina**, V. E. Fedorov**, ***, b Novosibirsk State Technical University / Semiconductor Devices and Microelectronics, Novosibirsk, Russia Nikolaev Institute of Inorganic Chemistry, Siberian Branch of Russian Academy of Sciences, Novosibirsk, Russia *** Novosibirsk State University, Novosibirsk, Russia a e-mail address: vitalii.a.kuznetsov@gmail.com b e-mail address: fed@niic.nsc.ru * ** Abstract - The paper reports the results of an experimental investigation of electron transport properties and piezoresistive effect of polycrystalline film and bulk samples of tungsten-rhenium disulphide W0.95Re0.05S2. Polycrystalline powder of the composition was synthesized by direct high temperature reaction of elements with stoichiometric ration. The film samples were prepared by ultrasonication of the powder in 35% ethanol with subsequent spraying of the colloidal dispersion onto preheated substrates. The bulk samples were formed by conventional compress technology at a pressure of 1.25 GPa. The strain gauge factor is equal to 13 and 19 for film and bulk samples, respectively. The band gaps were estimated from temperature dependences of conductivity to be about 360 and 470 meV, respectively. I. INTRODUCTION New functional materials are very prospective for microelectronic application. Sensor electronics is one of the most interesting branches to use unusual properties of functional materials. Traditionally metal and semiconductor strain sensors are used to measure mechanical quantities. Strain gauge factor (SGF) is one of the main parameters to evaluate efficiency of strain sensors. The SGF of semiconductor sensors is one-two orders more than of the metallic ones, as well as SGF of monocrystalline semiconductor samples is several times more than of polycrystalline ones [1, 2]. One of the famous layered materials is graphene and graphene-based compounds. Electron transport and strain sensing properties of such materials were investigated quite well [3-10]. Similar to graphene transition metal dichalcogenides (TMDC) have layered structure, but their electron transport and strain sensing properties are not well known. Electron transport properties of the thin polycrystalline films are different in comparison with bulk samples, because when passing from separate micro- or nanosized monocrystalline particles to polycrystalline arrays of them, the current flow in a bulk sample will perform both inside the particles and through the contacts between them. That leads to the additional variation in the electron The study was supported by the Russian Science Foundation (Grant no. 14-13-00674). 16 transport properties of massive samples that consist of numerous nano- and microcrystallites. In addition, a low conducting layer is formed on the surface of the most of nanodimensional electroconductive particles. In many cases, the electrical conductivity in arrays of such particles is determined mainly by the contact resistance. It is established experimentally that the electron transport properties (electrical conductivity, current–voltage characteristics, magnetoresistance, etc.) of materials substantially vary upon their decreasing to nanometre sizes [11-15]. It was established by the example of numerous systems that conductivity in polycrystalline materials with a high contact resistance was performed by tunnelling charge carriers between the crystallites separated by conducting barriers (contacts between the crystallites) [16, 17]. Such electron transport is described by fluctuation induced tunnelling conduction (FITC) model with several parameters to be estimated [18]: σ(T) = σ1 ∙ exp[ – Tt / (T + Ts)], (1) where Tt is the temperature which corresponds to the energy necessary for the electron transition between the crystallites and Ts is the temperature, below which the conductivity reaches the saturation. Recently we investigated electron transport properties of bulk polycrystalline samples of transition metal dichalcogenides (TMDC) and strain sensing properties of film samples of molybdenum-rhenium disulphide [19, 20]. But we have not investigated bulk samples for strain sensing properties. Tungsten disulphide WS2 is one of bright representatives from layered materials family. 2HWS2 polymeric three-atom thick (S-W-S) layers are bound to the neighbouring layers via van der Waals S…S bonding. Due to the presence of weak van der Waals interaction MoS2 and WS2 can be dispersed under ultrasonic treatment in liquid medium forming colloidal dispersions with nanosized sheets. That exfoliation method allows producing of stable colloidal dispersions with the particles a few layers thick and tens to hundreds nanometre in lateral size [21], and such dispersions can be used to form thin films, which consist of the nanoparticles. MIPRO 2016/MEET Figure 1. X-ray powder diffraction pattern of W0.95Re0.05S2 synthesized in high-temperature ampoule approach Figure 2. X-ray powder diffraction pattern of the W0.95Re0.05S2 film sprayed onto preheated amorphous quartz glass substrate The goal of this work is to investigate strain gauge factor and electron transport properties of film and bulk samples of tungsten disulphide doped with rhenium. 2 mm in width. Samples for studying strain sensing properties were glued to a beam of uniform strength (in bending) like in [22]. The glue was polymerized according to its bonding technology with the maximum temperature of 180°C. Before heating samples were covered additionally with the glue to reduce an environmental influence. II. EXPERIMENTAL A. Preparation of W0.95Re0.05S2 film and bulk samples We used the same methods and apparatuses as we did in [22]. W0.95Re0.05S2 was synthesized via ampoule hightemperature method from stoichiometric mixture of the elements. The synthesized W0.95Re0.05S2 was analysed by X-ray powder diffraction method. XRD-analysis has shown the single phase with some broadening of reflections (see Fig. 1). To prepare film samples 1.0 g of the powder was placed into a glass flask with 250 ml of ethanol-water solution (35 / 65% in volume) and ultrasonicated for 48 h. The resulting mixture was centrifuged at 1600 rpm for 25 min. The colloidal dispersion was sprayed onto a preheated to 180°C steel beam covered with polymer glue VL-931 like in [22]. Thickness of the samples resulted was estimated by weighting to be about 1 μm. For studying electron transport properties the dispersion was sprayed the same way onto Al2O3 polished substrates. The XRD pattern of thin film sample sprayed onto quartz glass polished is shown in Fig. 2. As one can see from Fig. 2 only 00l peaks are visible and it is indirect proof that the particles in films sprayed are oriented generally in parallel to the substrates. Bulk samples were formed of the powder with a laboratory hydraulic press at a pressure of 1.25 GPa. Thickness of the tablets was about 0.25 mm. To investigate the properties the tablets were cut to strips TABLE I. Samples SAMPLES’ PARAMETERS Sprayed Pressed 1 250 11000 166 Band gap, Δ (meV) 470 360 Strain gauge factor, K 13.0 19.5 Thickness, d (μm) Resistivity, ρ (Ohm·cm) MIPRO 2016/MEET B. Electron transport properties The temperature dependences of resistivity were measured using four-point probe method for bulk samples and two-point probe method for film ones. Contacts to the samples were made of the silver paste and thin gold wires. Two-point probe method was used for films instead of four-probe one because of high resistance of the films, but it should be noted that the contacts were ohmic. The dependences measured are shown in Fig. 3 and Fig. 4 in different axes. As one can see from Fig. 4 the experimental curves are well fitted by the equation (1). That exponential dependence is typical for fluctuation induced tunnelling conduction with the temperature Tt corresponding to the energy necessary for the electron transition between the crystallites. In fact that transition is associated with overcoming energy gap Δ ~ kB ∙ Tt, the energy gap an electron has to tunnel through from one crystallite to another. The energy gaps Δ were estimated from the slopes of the fitting straight lines (see Fig. 4). The results are shown in Table 1. As intercrystalline boundaries are most likely to be smaller in tablets in contrast to films due to pressing, it seems reasonable to say that the band gaps difference is concerned with the difference of the contacts. C. Strain sensing properties The device for measuring SGF was the same as we used in [20, 22]. Electrical contacts to the samples were made of silver paste “Dotite D-500” and thin copper wires. The contacts were ohmic. Every compressiontension cycle was lasted for 10 minutes with loading and unloading period being the same. The dependences of resistance on strain are shown in Fig. 5 for film samples in comparison to bulk ones. SGF was defined as slope of the line connecting two extreme values. The results are shown in Table 1. One can see that SGF of bulk samples is more 17 Figure 3. Temperature dependences of conductivity of the film and bulk samples Figure 4. Temperature dependences of conductivity with the fitting lines for energy gaps estimation by equation (1) than SGF of thin films. It was shown above that the intercrystalline contacts were different in bulk and film samples, and most likely the reason of the difference in the SGF is different contribution of the crystallites themselves and the contacts between them to changing of electrical resistance of the bulk and film samples when mechanical strain is applied. Especially as particles of layered compounds a few layer thick like ones the films involved consist of have great Young’s modulus [23]. Anyway the difference should be estimated with a conventional beam because of small thickness of the beams used and relatively large thickness of the bulk samples. electrical conductivity of bulk samples in contrast to films as a result of better densification of the crystallites due to pressing. The SGF of bulk samples is higher than the factor of films. It can be circumstantial evidence that intercrystalline boundaries make great contribution in piezoresistive effect. III. CONLUSION We have studied piezoresistive effect and electron transport properties of film and bulk polycrystalline tungsten-rhenium disulphide W0.95Re0.05S2 samples. Temperature dependences of conductivity are well described by fluctuation induced tunnelling conduction model. The band gaps have been estimated to be about 470 and 360 meV for film and bulk samples, respectively. The strain gauge factors of film and bulk samples have been estimated from experimental curves to be about 13 and 19, respectively. The difference in energy gaps can be explained by lesser contribution of contact resistance in Figure 5. Resistance-strain characteristic of W0.95Re0.05S2 film and bulk samples at room temperature in ambient 18 REFERENCES [1] N. Maluf and K. Williams, An Introduction to Microelectromechanical Systems Engineering. Boston: Artech House Inc., 2004. [2] M. Elwenspoek and R. J. Wiegerink, Mechanical microsensors. Berlin: Springer-Verlag Berlin Heidelberg, 2001. [3] S. D. Sarma, A. K. Geim, P. Kim, and A. H. MacDonald, "Special edition: Exploring graphene - Recent research advances," Solid State Communications, vol. 143, pp. 1-126, 2007. [4] B. Partoens and F. M. Peeters, "From graphene to graphite: Electronic structure around the K point," Physical Review B, vol. 74, pp. 075404-1-10, 2006. [5] C. Gomez-Navarro, R. T. Weitz, A. M. Bittner, M. Scolari, A. Mews, M. Burghard, et al., "Electronic transport properties of individual chemically reduced graphene oxide sheets," Nano Letters, vol. 7, pp. 3499-3503, 2007. [6] X. Du, I. Skachko, A. Barker, and E. Y. Andrei, "Approaching ballistic transport in suspended graphene," Nature Nanotechnology, vol. 3, pp. 491-495, 2008. [7] G. Eda, G. Fanchini, and M. Chhowalla, "Large-area ultrathin films of reduced graphene oxide as a transparent and flexible electronic material," Nature Nanotechnology, vol. 3, pp. 270-274, 2008. [8] G. T. Pham, Y.-B. Park, Z. Liang, C. Zhang, and B. Wang, "Processing and modeling of conductive thermoplastic/carbon nanotube films for strain sensing," Composites Part B-Engineering, vol. 39, pp. 209-216, 2008. [9] N. Hu, Y. Karube, M. Arai, T. Watanabe, C. Yan, Y. Li, et al., "Investigation on sensitivity of a polymer/carbon nanotube composite strain sensor," Carbon, vol. 48, pp. 680-687, 2010. [10] A. Bessonov, M. Kirikova, S. Haque, I. Gartseev, and M. J. A. Bailey, "Highly reproducible printable graphite strain gauges for flexible devices," Sensors and Actuators A-Physical, vol. 206, pp. 75-80, 2014. [11] P. Monceau, "Electronic crystals: an experimental overview," Advances in Physics, vol. 61, pp. 325-581, 2012. [12] A. A. Stabile, L. Whittaker, T. L. Wu, P. M. Marley, S. Banerjee, and G. Sambandamurthy, "Synthesis, characterization, and finite size effects on electrical transport of nanoribbons of the charge density wave conductor NbSe3," Nanotechnology, vol. 22, p. 485201 (6pp), 2011. MIPRO 2016/MEET [13] A. I. Romanenko, O. B. Anikeeva, V. L. Kuznetsov, A. N. Obrastsov, A. P. Volkov, and A. V. Garshev, "Quasi-twodimensional conductivity and magnetoconductivity of graphitelike nanosize crystallites," Solid State Communications, vol. 137, pp. 625-629, 2006. [14] Y. S. Hor, Z. L. Xiao, U. Welp, Y. Ito, J. F. Mitchell, R. E. Cook, et al., "Nanowires and Nanoribbons of Charge-Density-Wave Conductor NbSe3," Nano Letters, vol. 5, pp. 397-401, 2005. [15] S. V. Zaitsev-Zotov, V. Y. Pokrovskii, and P. Monceau, "Transition to 1D Conduction with Decreasing Thickness of the Crystals of TaS3 and NbSe3 Quasi-1D Conductors," JETP Lett., vol. 73, pp. 29-32, 2001. [16] A. I. Romanenko, D. N. Dybtsev, V. P. Fedin, S. B. Aliev, and K. M. Limaev, "Electric-Field-Induced Metastable State of Electrical Conductivity in Polyaniline Nanoparticles Polymerized in Nanopores of a MIL-101 Dielectric Matrix," JETP Letters, vol. 101, pp. 59-63, 2015. [17] Y. Zhao and W. Li, "Fluctuation-induced tunneling dominated electrical transport in multi-layered single-walled carbon nanotube films," Thin Solid Films, vol. 519, pp. 7987-7991, 2011. [18] P. Sheng, "Fluctuation-Induced Tunneling Conduction in Disordered Materials," Physical Review B, vol. 21, pp. 21802195, 1980. MIPRO 2016/MEET [19] V. E. Fedorov, N. G. Naumov, A. N. Lavrov, M. S. Tarasenko, S. B. Artemkina, and A. I. Romanenko, "Tuning electronic properties of molybdenum disulfide by a substitution in metal sublattice," in 36th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO), Opatija, Croatia, 2013, pp. 11-14. [20] V. A. Kuznetsov, A. S. Berdinsky, A. Y. Ledneva, S. B. Artemkina, M. S. Tarasenko, and V. E. Fedorov, "Strain-sensing element based on layered sulfide Mo0.95Re0.05S2," in 38th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO), Opatija, Croatia, 2015, pp. 15-18. [21] V. Nicolosi, M. Chhowalla, M. G. Kanatzidis, M. S. Strano, and J. N. Coleman, "Liquid Exfoliation of Layered Materials," Science, vol. 340, pp. 1420-+, Jun 21 2013. [22] V. A. Kuznetsov, A. S. Berdinsky, A. Y. Ledneva, S. B. Artemkina, M. S. Tarasenko, and V. E. Fedorov, "Film Mo0.95Re0.05S2 as a strain-sensing element," Sensors and Actuators A: Physical, vol. 226, pp. 5-10, 2015. [23] A. Castellanos-Gomez, M. Poot, G. A. Steele, H. S. J. van der Zant, N. Agrait, and G. Rubio-Bollinger, "Elastic Properties of Freely Suspended MoS2 Nanosheets," Advanced Materials, vol. 24, pp. 772-775, 2012. 19 Luminescent diagnostics in the NIR-region on a base of Yb-porphyrin complexes V.D. Rumyantseva1,2, I.P. Shilov2, Yu.V. Alekseev3, A.S. Gorshkova1 1. Moscow Technological University, 119454 Moscow, Russia. E-mail: vdrum@mail.ru 2. Kotel’nikov Institute of Radioengineering and Electronics RAS, 141190 Fryazino, Moscow region, Russia. E-mail: ipshilov@ms.ire.rssi.ru 3. State Scientific Center of Laser Medicine, Moscow, Russia. E-mail: ural377@mail.ru The problem of early diagnostics, necessary for successful therapy of oncological diseases, is well known. In this field an optical diagnostic methods are the most effective. The optical response of a luminescent label can indicate the condition of biological tissues and biochemical processes occurring in them in real time. The most promising luminescent labels for non-invasive diagnostics are those that absorb and emit light in the NIR-region (~700-1100 nm), where the biological tissues absorption and auto-fluorescence are minimal. One of the most promising studied by us label is Yb-complex of 2,4dimethoxyhematoporhyrin IX, it possess the optimum chemical and photophysical properties such as a tumorotropic, a high molar extinction coefficient, a large Stokes shift, an effective luminescence, chemical and light stability and an ability to be used in aqueous media. Therefore, a special attention is given to luminescent diagnostics method development for endoscopic and visually available cancer forms on a base of Yb-complex of 2,4dimethoxyhematoporhyrin IX and laser-fiber NIR-range fluorimeter. La3+ luminescent level lies below T1-porphyrin level, are of the most interest. Porphyrins complexes with Er, Nd and Yb possess 4fluminescence in the near-infrared region (NIR-region) of spectrum, which is become possible because of an intramolecular energy transfer from the triplet state of a porphyrin (located in the range of 12 500-13 500 cm-1) to the lower resonance levels of Er3+, Nd3+ and Yb3+ (6 450, 11 500 and 10 200 cm-1 respectively) (fig. 1) [4]. I. INTRODUCTION Various porphyrin compounds perform important functions, which are indispensable for the existence and development of flora and fauna on Earth. At the end of the XX century porphyrins have been used in photodynamic therapy and diagnostics of malignant tumors due to their ability to accumulate in various types of cancer cells and tumor microvessels. For today a whole series of photosensitizers effectively generating a singlet oxygen, were synthesized: Photofrin II, Photogem, Foscan, Fotoditazin, Photolon, Radachlorin, Photosens, Alasens, Tookad and others. However, the free bases of these macroheterocycles have a side effect - phototoxicity, which unfavorable influence during diagnostics procedures. It is possible to overcome that drawback by use of diagnostics photosensitizers which are practically do not regenerate a singlet oxygen, while maintaining a high affinity to malignant tumors. Ytterbium complexes of natural and synthetic porphyrins are such compounds [1, 2]. II. RESULTS AND DISCUSSION Lanthanide luminescence weak enough by itself is significantly enhanced in a porphyrins metallocomplexes. It is connected to the transfer of macrocycle excitation energy to the La3+ ion [3]. Among lanthanides the erbium, neodymium and ytterbium complexes whose 20 Figure 1. Diagram of energy levels and luminescence spectra of Er3+, Nd3+ и Yb3+ ions. Ytterbium porphyrins complexes have been chosen as research objects due to the fact that under excitation of the π-electron part of molecule the luminescence which occur from transitions 2F5/2 → 2F7/2 of the Yb3+ 4felectron level (2F5/2 - excited state, 2F7/2 - the ground state), is observed. Introduction of ytterbium ion in a porphyrin leads to reduce of photochemical activity, but the selectivity of accumulation in tumors, characteristic to the most of porphyrins, still remains. The reduction of a singlet oxygen quantum yield is causes by that the luminescent level of Yb3+ ion lies rather below the triplet state of the molecule organic part, but higher than that of a singlet oxygen. As a result, the porphyrin matrix excitation under the influence of external light radiation is not transferred to oxygen, but intercepted by Yb 3+ ion, thereby strongly reducing the sensitized by porphyrin singlet oxygen generation [5]. These transformations are shown in fig. 2. MIPRO 2016/MEET Figure 2. Scheme of electronic transitions of porphyrin sensitizers and singlet oxygen generation: (1) absorption, (2) fluorescence, (3) intercombination conversion, (4) phosphorescence, (5) excitation transfer to oxygen and the transition of triplet oxygen 3O2 to singlet oxygen, (6) excitation transfer to the Yb3+ ion, and (7) luminescence of the Yb3+ ion. Porphyrin molecules form stable complexes with ytterbium ions which have intense absorption in the near infrared region (NIR-region) of spectrum [6]. The extinction coefficient (ε) for Yb-complexes is 104-105 M1 cm-1 that is almost 4 orders higher than ε value under direct excitation of Yb3+ ion itself, so one can assume that Yb3+ ion excitation through porphyrin matrix provides the most effective way for strong 4f-luminescence than direct excitation of Yb3+. At the same time, introduction of various substituents in meso and/or β-positions of macrocycle allows to drastically modify the physicochemical properties of lanthanide porphyrins complexes, that plays the important role in medicine and photochemistry use [7]. Figure 3. Structural formula of dipotassium salt of Yb-2,4dimethoxyhematoporphyrin IX complex. In studies of ytterbium ions infrared luminescence in complexes solutions with organic reagents the main purpose is to minimize a non-radiative loss of excitation energy. In macrocyclic ligands a complexing ion is effectively protected from effects of high frequency O-H and C-H oscillations of solvent molecules, which play an important role in a non-radiative degradation of electronic excitation energy. The first studies of ytterbium porphyrins complexes as luminescent markers on animals with malignant tumors were carried out on liposomal forms of coproporphyrin III, protoporphyrin IX, hematoporphyrin IX as their methyl ethers [8], as well as water-soluble synthetic derivatives of tetraphenylporphyrin [9]. Ytterbium porphyrins complexes have the characteristic for rare-earth ions narrow and rather bright luminescence line, which for Yb3+ ion locates in the infrared range of 975-985 nm in a "therapeutic window of biological tissues transparency", where their own luminescence is practically absent. The lifetime (τ) for the Yb-2,4-dimethoxyhematoporphyrin IX complex was 11 μs [13], the luminescence decline have a nonexponential pattern that is due to a strong luminescence quenching by a fluctuations of OH-groups from an inner environment of the ytterbium ion. A significant difference in lifetimes of the excited state of ytterbium complexes of porphyrins hydrophobic ethers and their acids is caused by a presence of intermolecular hydrogen bonds and luminescence quenching by water [14]. Continuing these studies more than two dozen ytterbium complexes of natural and synthetic porphyrins were synthesized [10]. The analysis of physical, chemical and luminescence characteristics and the results of biological tests revealed that one of the most promising compounds for diagnostic purposes is the dipotassium salt of Yb-2,4-dimethoxyhematoporphyrin IX complex [11] (fig. 3). To increase diagnostic potential of ytterbium porphyrin complexes it is necessary to isolate them from the quenching effect of a water environment whenever possible. A preferred solvent for such compounds may be DMSO, which has unique biomedical and pharmacological properties: it penetrates through biological membranes, improves transport properties of drugs, stimulates immune system. This complex is similar in structure to natural protoporphyrin IX, which iron complex is the hemoglobin prosthetic group. The substance is low toxic, well soluble in water, its synthesis based on blood hemin, is simple and cheap [12]. The emission luminescence spectra of the Yb-2,4dimethoxyhematoporphyrin IX complex in aqueous solutions with different concentrations of DMSO are shown on fig. 4. MIPRO 2016/MEET 21 mixtures in various proportions were used. The Ybcomplex pharmaceutical composition was found to accumulate fast enough (less than 1 hour) in places with pathological changed skin and mucous membranes. Herewith the clear luminescence intensity difference was established compared to healthy tissue (900-1100 nm range). The luminescence parameters change depends on a measurement time and a pathological process character [17]. This method can be successfully applied in dermatology, dentistry, gynecology, veterinary and other fields of medicine; it is characterized by simplicity of performance, availability, informativity and low toxicity. The best results of accumulation in pathological changed skin were obtained for the pharmaceutical composition based on Tisolum. The contrast index value was in range 3.0-15.0. III. CONLUSION Figure 4. Emission luminescence spectra of the Yb-2,4-dimethoxyhematoporphyrin IX in aqueous solutions with different DMSO concentrations: 1 - 100% DMSO, 2 - 50% DMSO, 3 - 20% DMSO, 4 - 100% H2O. Under conditions of lower polarity (solutions with growing concentration of DMSO) emission maxima are shifted toward long-wave spectrum region (solvatochromism phenomenon) [15]. From fig. 4 one can see that the luminescence intensity increases significantly with increasing of DMSO concentration (> 10 times in transition from Yb-complex aqueous solution to 100% DMSO solution), and the emission spectrum maximum shifts to almost 10 nm at the same time. The lifetime in 100% DMSO solution was ~ 22 μs. А 20-30% aqueous DMSO solutions are allowed in medicine and are of practical interest in use them for intravenous injections. For them τ ~ 5÷10 μs. Study of a photosensitized luminescence kinetic signals of singlet oxygen in aqueous solutions showed that the quantum yield of a singlet oxygen generation for the Yb-2,4-dimethoxyhematoporphyrin IX complex reduces almost 4 times (to 11%) from 40% for a free base porphyrin, that experimentally confirms its low phototoxicity. Previously in in vivo experiments on mice with subcutaneously grafted sarcoma 5-37 it was found by luminescence in the NIR-region the preferential Ybcomplex accumulation in tumor tissue compared with normal one [16]. The injection of the dipotassium salt of Yb-complex was carried out intravenously, the luminescent accumulation contrast was determined after 48 hours. Continuing the studies we designed the amphiphilic pharmaceutical compositions in the form of gels for epikutan use as well as for application to mucous membranes. The optimum concentration of the Yb-2,4dimethoxyhematoporphyrin IX complex (~ 0.05%, w/w) was received, gels Tisolum, Kalgel, Cremophor and their 22 Thus, ytterbium porphyrins complexes are promising diagnostic markers of malignant tumors and pathological changes of skin and mucous membranes in the NIRregion of spectrum and possess a low phototoxicity. ACKNOWLEDGMENT This work was supported by the Russian Federation Ministry of Education and Science, the project № 4.128.2014/K. REFERENCES [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] M.I. Gaiduk, V.V. Grigoryants, A.F. Mironov, L.D. Roytman, V.I. Chissov, V.D. Rumyantseva, G.M. Sukhin. Dokl. AN SSSR. Seriya Biofizika. 1989, vol. 309, N 4, pp. 980-983 (in russ). M.I. Gaiduk, V.V. Grigoryants, A.F. Mironov, V.D. Rumyantseva, V.I. Chissov, G.M. Sukhin. J. Photochem. Photobiol., B.: Biology. 1990. vol. 7, pp. 15-20. A.F. Mironov. Uspekhi Khimii. 2013, vol. 82, N 4, pp. 333-351 (in russ). I.P. Shilov, A.V. Ivanov, V.D. Rumyantseva, A.F. Mironov. In book: Fundamental scienses for medicine. Biophysical medical technologies (Ed. A.I. Grigorev and Yu.A. Vladimirov). M.: Maks-Press. Vol. 2, 2015, pp. 110-144 (in russ). A.V. Ivanov, V.D. Rumyantseva, K.S. Shchamkhalov, I.P. Shilov. Laser Physics, 2010, vol. 20, N 12, pp. 2056-2065. M. Gouterman. The Porphyrins (Ed. D. Dolphin), N 4, San Francisco, London: Academic Press. 1978, vol. 3, pp. 1-156. M.I. Gaiduk, V.V. Grigoryants, A.F. Mironov, V.D. Rumyantseva. Proc. Estonian Acad. Sci. Phys. Math., 1991, vol. 40, N 3, pp. 198-204. RF Patent 1340087. Byull. Izobret. N 18 filed 27.06.1995. P. 273. M.I. Gaiduk, V.V. Grigoryants, V.D. Menenkov, A.F. Mironov, V.D. Rumyantseva. Izv. AN SSSR. Seriya Fizika. 1990, vol. 54, N 10, pp. 1904-1908 (in russ). V.D. Rumyantseva, A.S. Gorshkova, A.F. Mironov. Fine Chem. Tech., 2014, vol. 9, N 1, pp. 3-17. RF Patent 2411243. Byull. Izobret. N 4 filed 10.02.2011. P. 731. V.D. Rumyantseva, A.F. Mironov, I.P. Shilov, K.S. Shchamkhalov, A.S. Ryabov, A.V. Ivanov. Abstracts of the XIII Int. Sci-Techn. Conf. «High-Tech in Chemical Engineering2010». Russia, Suzdal. 29.06–02.07.2010. P. 192 (in russ). V.D. Rumyantseva, A.F. Mironov, K.S. Shchamkhalov, G.M. Sukhin, I.P. Shilov, V.M. Markushev, Z.V. Kuzmina, N.I. Polyanskaya, A.V. Ivanov. Lazernaya Medicina, 2010, vol. 14, N 1, pp. 20-25 (in russ). MIPRO 2016/MEET [14] A.S. Stachevski, V.N. Knyukshto, A.V. Ivanov, V.D. Rumyantseva, I.P. Shilov, V.A. Galievsky, B.M. Dzhagarov. Abstracs of the Int. Conf. «Molecular and cellular basics of biosistems functioning». 17.06–20.06.2014. Belarus, Minsk: Book of Abstracs in 2 ch. Ch. 1, 2014, pp. 128-130 (in russ). [15] A.S. Stachevski, V.N. Knyukshto, A.V. Ivanov, V.D. Rumyantseva, I.P. Shilov, V.A. Galievsky, B.M. Dzhagarov. J. Appl. Spectr., 2014, vol. 81, N 6, pp. 938-942. MIPRO 2016/MEET [16] V.D. Rumyantseva, K.S. Shchamkhalov, I.P. Shilov, L.U. Kochmarev, V.M. Markushev, Z.V. Kuzmina, N.I. Polyanskaya, A.S. Ryabov, A.V. Ivanov. Medicinskaya fizika, 2011, N 2, pp. 67-73 (in russ). [17] Yu.V. Alekseev, A.V. Ivanov, A.S. Ryabov, N.M. Shumilova, I.P. Shilov. Ros. Bioterapevticheskiy Zhurnal, 2015, vol. 14, N 1, P. 59 (in russ). 23 Simulation Study of the Composite Silicon Solar Cell Efficiency Sensitivity to the Absorption Coefficients and the Thickness of intrinsic Absorber Layer * * V. Tudić N. Posavec * Karlovac University of Applied Sciences/Department of Mechanical Engineering, Karlovac, Croatia vladimir.tudic@vuka.hr nikola.posavec.1@gmail.com ABSTRACT - In this paper, two silicon solar cells p+-ii-n+ with homogenous and heterogeneous intrinsic absorber layers based on hydrogenated amorphous-nanocrystallinemicrocrystalline silicon (a-Si:H/nc-Si:H/c-Si:H) have been studied by computer modeling and simulation program (AMPS-1D - Analysis of Microelectronic and Photonic Structures). Various factors that affect cell efficiency performance have been studied such as layers absorption coefficients, band gap and layer thickness up to 1200nm. It was found that in the case of standard solar cell conditions a layers absorption coefficient has a major contribution to solar cell performance according to measurement on the actual solar cell samples. It is demonstrated that, for homogenous a-Si:H/nc-Si:H intrinsic absorber layer with constant crystal fraction of Xc=30% cell efficiency is higher than in case of heterogeneous intrinsic absorber layer which contains various crystal fractions depending of absorber layer thickness. Second case scenario of silicon thin film composite structure is more common in solar cells production by using PECVD and HWCVD deposition techniques which is proven by X-ray diffraction and high resolution electron microscopy measurements. I. INTRODUCTION In previous work a semi conducting silicon properties, hetero-junctions and photo effect in intrinsic silicon thin films have been studied, and optical generation and recombination of free carriers have been investigated [1, 2]. Also, bases physical principles of simple silicon solar cells with accent to amorphousnanocrystalline layers as promissing composite material for high efficiency solar cells called third generation have been studied. In present work basic principles of pin structure solar cells have been carried out with one dimensional computer modelling programme AMPS-1D. Suggested solar cell model are based on simple pin structure in order to compare simulation results with others in references. Computer modelling programmed allows solar cell parameter calculations and structure design simulations. By varying characteristic set of solar cells parameters such as: illumination spectra, photon flux, absorption coefficient, boundary conditions, front and back contact parameter, general silicon layer parameters as doping and free carrier concentrations, 24 mobility, gap state defect distribution, I-V characteristic, fill factor (FF), and efficiency () of solar cell can be determinate. In this work a few simulation groups are carried out with homogenous distribution of Si nc in layer and with layers where crystallinity and crystal sizes change across the layer. Calculated solar cells efficiency of modelled single and multilayer absorber structures were graphically presented and discussed. A goal of this work is to suggest possible application of a-Si:H/nc-Si:H/c-Si:H film as active part in a typical pin solar cell in respect to overall performance. A detailed comparison of calculated data extracted from defined model suggest optimal absorber layer thickness in composite silicon solar cell and leads to a better understanding of effective solar cell thickness. II. INTRINSIC LAYER MICROSTRUCTURE Amorphous-nanocrystalline silicon (a-nc-Si) films of few hundred nanometers in thickness consist of a matrix of amorphous silicon with embedded silicon crystals of nanometric dimensions [3, 4]. This material has improved properties with respect to pure amorphous Si (a-Si), micro-crystalline Si (c-Si) and bulk crystalline Si (c-Si) owning quantum confinement effect [5]. When compared to amorphous silicon, nc-Si has better electrical transport characteristic [6], possibility to tailor the optical band gap [7] and resistance to light induced degradation [8]. Micro-crystalline silicon (c-Si) films of 1-2 micrometers in thickness consist of small crystal aggregates in deeper layers and large grains/columns up to film surface [9, 10]. On the other hand microcrystalline Si (c-Si) and bulk crystalline Si (c-Si) compared to nc-Si has better conductivity along crystal grains according to higher free carrier mobility and less numbers of grain boundaries. Therefore c-Si layers has better electrical transport characteristic compared to a-ncSi [11]. The main advantage of nc-Si in comparison with crystalline silicon is its higher absorption, which allows efficient solar cells in thin film designed device. This MIPRO 2016/MEET material could be fraction of “small crystals”, expected to show effect of increase in optical gap due to quantum confinement predictable according to effective mass theory or quantum dots. band gap energy (Eg=1.82eV) caused by lower crystalline fraction (vol. 30%). As a result of PECVD and HWCVD deposition techniques micro-crystalline silicon thin films typically forms a microstructure presented in Figure 1. It has a complicated microstructure, mixture of crystalline silicon (c-Si) grains, grain boundaries and/or amorphous-nano-crystalline hydrogenated silicon (a-ncSi:H) often called ”tissue”. Approach followed in this paper is to compare on as wide as possible range of a-ncc-Si:H layer-samples the microstructure (grain size, crystallinity and roughness) with the optical and transport properties and to find overall performance. We have used the fact that a-nc-c-Si:H microstructure changes with the thickness of sample [11]. AFM and SEM micrographs of some authors [10, 12] reveals surface morphology of micro-crystalline silicon films thickness as sample of 1.4 micrometers published in [12]. Figure 2. Distribution of absorption coefficient of nc-Si (black line), a-Si (green line) and c-Si (red line) calculated from FTPS and PDS [4]. IV. SIMULATION MODEL Computer modeling software AMPS-1D can simulate all modeled semiconductor and photovoltaic device structures. In the present version of AMPS-1D user can chose one of two different calculation models: Density of States (DOS approach) and Carrier Lifetime Model (CLM). DOS approach is more suited for silicon amorphous and nano-crystalline thin films layers due to large defect densities in midgap states [13, 14]. Figure 1. Typical SEM micrograph of composite silicon (a-nc-cSi:H) thin film 1.4 m of thickness produced with Cat-CVD (Hot Wire CVD) technique, published in [12]. Large complexity of microstructure in hydrogenated microcrystalline silicon and existence of tissue with at least two different sizes of crystallites determine the optical properties and therefore modeled complex free carrier mobility based on mechanism of transport. III. OPTICAL PROPERTIES The optical properties of composite silicon thin films strongly depend on the production conditions and determination of the structural properties of the thin film. The spectra of absorption coefficient, (E), of typical ncSi thin film calculated from transmittance, Fourier transform photocurrent spectroscopy (FTPS) and photo deflection spectroscopy (PDS) are shown in Figure 2. Absorption coefficient data used in our calculation and device simulation was published in literature [3, 4]. Amongst others, sample number two (K2) is chosen in simulation modeling because of convenient value of its MIPRO 2016/MEET Model of photovoltaic device can be simulated through optical and material parameters on device designed structure. Material and optical properties affect electrical parameters in tree differential equations in correlation: the Poisson's equation, electron continuity equation and hole continuity equation. Those tree equations are solved simultaneously under nonequilibrium steady-state conditions (i.e., under the effect of light, voltage bias or both) by using method of finite differences and Newton-Raphson technique. The used equations are: Poisson's equation,  x   x    x (1) Electron continuity equation, 1 J n x   R px , nx   Gop x  q x (2) And hole continuity equation, 1 J p x   Gop x   R px , nx  q x (3) 25 All carriers in semiconductor layer can be described by net charge density value  x , expressed like   x  q px  nx  pT x  nT x  N A  N D  (4) And the present electrostatic field E is defined as x=0 and x=L. These constraints force the mathematics to acknowledge the fact that the currents must cross at x=0 and x=L (contact position) by either thermionic emission or interface recombination. Mathematically expressed current values at boundaries: J n (0)  qS n 0 n0  n0 (0) (8) (5) J p (0)  qS p 0  p0  p0 (0) (9) In equation  is the dielectric constant, E is electrostatic field, (x) represents the position of energy in the local vacuum level, x is position in the device, n and p the extended states density in conduction and valence band, respectively, pT and nT the trapped hole and electron population density, NA acceptor doping density, ND the donor doping density, if exists, the q the electron charge, R(x) the recombination rate, Gop(x) the optical generation rate of free electron-hole pair, Jn and Jp the electron and hole current density, respectively. The term R(x) is the net recombination rate resulting from band-to-band (BTB) direct recombination and Shockley-Read-Hall (SRH) indirect recombination traffic through gap states. The model used in AMPS for indirect recombination assumes that the traffic back and forth between the delocalized bands and the various types of delocalized gap states is controlled by SRH, capture and emission mechanisms. Since AMPS has the flexibility to analyze device structures which are under light bias (illumination) as well as voltage bias, the continuity equations include the term Gop(x) which is the optical generation rate as a function of x due to externally imposed illumination. J n ( L)  qS nL nL   n0 ( L) (10) J p (0)  qS pL  pL   p0 ( L) (11)   x  . E x Generally, the three state variables completely define the state of a device: local vacuum energy level  and the quazi-Fermi levels EFp and EFn. Once those tree dependent variables are calculated as a function of position in device, all other parameter can be determinate as its function. In thermodynamic equilibrium, the Fermi level is a constant as a function of position and hence the tree equations (1-3) essentially reduce to Poisson's equation. Therefore, local vacuum energy level  is the only variable to solve in thermodynamic equilibrium. Otherwise, in non-thermodynamic equilibrium steady-state, a system of three correlated non-linear second order differential equations in the tree unknowns (, EFp, EFn) is obtained. Further calculation need six boundary conditions, two for each dependent variable. The first two boundary conditions are modified versions of the ones used for solving Poisson's equation in thermodynamic equilibrium: 0  0   L   bL   0  V (6) And L  0 (7) Where L is the total length of the modeled device, L are the electron affinities at x=0 and x=L, respectively and V is applied voltage. Zero value of local vacuum energy level is at boundary point x=L. The four other boundary conditions are obtained from imposing constraints on the currents at the boundaries at 26 Where Sn0Sp0 are surface recombination velocities for electrons and holes respectively at the x=0 interface and the quantities are the corresponding velocities at the x=L interface. Strongly limited by thermionic emission largest value of recombination velocities cannot over cross 107cms-1. In equations (8-9) n(0) and p(0) are the electron and hole density at x=0, n(L) and p(L) are the same values at x=L. Analogy, n0(0) and p0(0), n0(L) and p0(L) are the electron and hole density at the thermodynamic equilibrium at boundaries x=0 and x=L, respectively. Now, when all conditions are defined, simultaneously calculation of , EFp and EFn can be obtained. The model used in our simulation for gap states consist of two exponential tail states distributions and two Gaussian distributions of deep defects states [13]. V. SOLAR CELL PARAMETERS In this paper we have simulated two types of single junction solar cells both with absorber layer thickness of 1200 nm. The first type have the standard pin structure with one homogenous absorber: (p-) a-Si:H/ (i-) a-Si:H/ (n-) a-Si:H and the second type with inhomogeneous absorber has a multi-layer structure with 9 intrinsic absorbers with various structural and therefore optical properties (Fig.2). Thickness of layers for standard pin structure is as follows: (p-) 8nm/ (i-) 1400nm/ (n-) 15nm. We propose an implementation of chemically textured zinc-oxide ZnO:Al film or SnO2 as a front TCO in our p-i-n solar cells, and in combination with Ag as a textured back reflector-enhancing dielectric layer for additional efficiency. Such modeled cells exhibit excellent optical and light-trapping properties demonstrated by high short-circuit current densities. A. Boundary conditions The tree governing equations (1), (2) and (3) must hold every position in the device and the solution to those equations involves determining the state variables (x) EFp(x) and EFn(x). Non-linear and coupled equations cannot be solved analytically, numerical methods must be utilized. Boundary conditions must be imposed on the sets of equations. These are expressed in terms of conditions on the local vacuum level and the currents at the contacts. To be specific the solution to equations (1), (2) and (3) must satisfy the boundary conditions (6-11). MIPRO 2016/MEET In computer modeling program AMPS-1D (0) is PHIB0, (L) is PHIBL, S is recombination speed for holes and electrons depending on carrier position. Parameter RF is reflection coefficient at x=0, RB is reflection coefficient at x=L. In actual situation RF is parameter of lost photon flux transmitting throe glass substrate and TCO layer. Measurements of solar cells internal optical loses (in visible spectrum) showing wavelength () or energy dependence[5, 6, 15]. Reflection coefficient RB depends of optical characteristics of back electrode which acts as an optical mirror and efficient light trapping. The properties of the front contact and back contact used as the model parameters are shown in Table I. Table I. Boundary conditions in AMPS-1D: PHIB0 is (0) at x=0; PHIBL is (L) at x=L (total device length); SN0-SNL-SP0-SPL are surface recombination speeds at x=0 and x=L (N=electron, P=hole); RF is reflection coefficient at x=0; RB is reflection coefficient at x=L. B. Front Contact Back Contact PHIB0 = 1.730 eV SN0 = 1x107 cm/s SP0 = 1x107 cm/s PHIBL = 0.120 eV SNL = 1x107 cm/s SPL = 1x107 cm/s RF = 0.250 RB = 0.600 Solar cells design Second type model of solar cell with multilayer absorbers will be detailed explained in this part. Solar cell design in fact represents a-Si:H/nc-Si:H/c-Si:H inhomogeneous intrinsic absorber of pin silicon solar cell with thickness of 1200 nm (Fig.3). Figure 3. Drawing of composite silicon (a-nc-c-Si:H) thin film cross section of 1.2 m thickness; it consists of multi layers with different structure and absorption properties. Design complies of multi-layer structure with 9 individual intrinsic absorbers arbitrary modeled thickness with various structural and optical properties (Fig.4). Solar cell model structure is as follows: Transparent Conductive Oxide (TCO) is solar cell front contact for collecting holes (0.3 m), a-Si:H (p-1) window layer (8 nm), a-nc-Si:H absorbers i1-2/i4-5 (100nm), nc-Si:H absorber i5-6/i6-7 (100 nm), nc-cSi:H absorber i7-8/i8-9 (200 nm), cSi:H absorber i9-10 (400 MIPRO 2016/MEET nm), a-Si:H (n-11) back layer (15 nm), Aluminumdoped Zinc Oxide (AZO) is reflection layer (0.1 m), Ag as a textured back reflector (0.5-2 m), Aluminum back contact electrode (2-3 mm). Figure 4. Model of solar cell with multi layer absorbers: (p-) is layer 1, (n-) is layer 11 and layers 2-10 are 9 individual intrinsic absorbers with various thicknesses, structural and optical properties [18]. The input parameters of all modeled solar cell layers in order to simulate efficiency properties of actual solar cell are not given in complete because of complexity. Published selected mobility parameters are taken from references [16, 17] and absorption coefficients from anc-Si:H and c-Si:H samples. We used standard boundary conditions and standard global illumination conditions (Table I), air-mass 1/cos (AM1.5 spectrum), 1000W/m2 at 300 K temperature reference. C. Absorption coefficients One set of absorption coefficient data we used in our calculation and device simulation was measured on samples and published in literature [3, 9]. It is coefficient for a-nc-Si:H sample and c-Si:H silicon layer. Amongst others, a-nc-Si:H sample is chosen in this modeling because of convenient value of its band gap energy (Eg=1.82eV) caused by lower crystalline fraction (vol. 30%), layer thickness (100 nm) and calculated DC conductivities and free carrier mobility [2]. Presented modeled device of solar cell (Fig.4) consists of 9 absorber layers with 2 actual absorption coefficients and 7 approximated absorption coefficients between a-ncSi:H (front) and c-Si:H (back) silicon layer. Therefore, absorption coefficients of each individual absorber layer are modeled and proposed by linear approximationsuperposition according to different microstructure (Fig.3) through solar cell variable length (parameter x). According to layer thickness or calculated length x each individual absorber layer has different superposition ratio of absorption coefficients between a-nc-Si:H and c-Si:H layers as it is in actual solar cell. For example: absorption layer 2 (i1-) in modeled solar cell is a-nc-Si:H actual sample with measured absorption coefficient, absorption layer 3 (i2-) is a-nc-Si:H modeled layer with proposed absorption coefficient in ratio a-nc-Si/c-Si (0,25/0,75), absorption layer 3 (i2-) is a-nc-Si:H modeled layer with proposed absorption coefficient in ratio a-nc-Si/c-Si (0,35/0,65), etc. Last layers are: absorption layer 9 (i8-) in modeled solar cell is nc-c-Si:H layer with proposed absorption coefficient in ratio nc-Si/c-Si (0,85/0,15) and absorption layer 10 (i9-) is c-Si:H actual sample with measured absorption coefficient (Fig. 5). 27 1000000 apsorption coefficient  (1/cm) 100000 10000 a-nc-Si:H sample lin. aprox. 1 lin. aprox. 2 lin. aprox. 3 lin. aprox. 4 lin. aprox. 5 1000 lin. aprox. 6 lin. aprox. 7 mc-Si:H sample 100 300 400 500 600 700 800 900 lambda  (nm) Figure 5. Absorption coefficients () of nine absorber layers in modeled solar cell [18]. hundred nanometers of thickness of a-nc-Si:H tissue. Current densities in our simulation had never reached its calculated maximum in both cases. In first solar cell model simulation improvement in efficiency is supported by at least one order of magnitude better absorption in anc-Si:H homogenous absorber, good conductivity and therefore high free carrier mobility. Also, efficiency curve points to saturation at 600 nm and its decrease at thicknesses higher then 900 nm. Calculated solar cell performances in first model simulation are likely expected according to physical nature of photogeneration and recombination of electron-hole pairs in intrinsic silicon with controlled general silicon layer parameters as doping and free carrier concentrations, mobility and gap state defect distribution. 20 Free carriers mobility VI SIMULATION RESULTS AND DISCUSION The performance of modeled solar cells was analyzed in respect to the current density (JSC) and efficiency () by incorporating the layer parameters into AMPS- 1D. First pin structure consists of homogenous intrinsic a-nc-Si:H absorber in thickness of 1200 nm with constant crystal fraction of Xc=30%. In second pin structure with the same thickness we modeled an inhomogeneous intrinsic absorber which consists of 9 individual homogenous layers with different optical and electrical properties suggesting experimentally proven structure in-homogeneity. In standard simulation conditions defined earlier for the first structure simulation results shown predictable curves and maximal value of JSC=17.421mA/cm2 at 1200 nm of absorber thickness and maximal efficiency of =13.992% at 935nm (Fig.6.). In second modeled structure calculated values are different: JSC=14.781mA/cm2 at 1200 nm of absorber thickness and maximal solar cell efficiency of =12.32% at 492 nm (Fig.7.). For both type of modeled devices simulation promotes typically exponentional rise of current density in the first 300 nm of thickness according to excellent photon absorption and collection of photo-generated electron-hole pairs in absorber structure with 10-100 ns free carrier life time in first few 28 18 16 14 2 Jsc (mA/cm ); efficiency (%) In mixed phases silicon thin film layers transport mechanism strongly depends of carrier mobility (cm2 V-1s-1). Electron n and hole p mobility have dependence to crystal lattice temperature also donor-like and acceptor-like doped concentrations [18], defect density [11], DC conductivity [19], suggesting electron mobility at temperature of 300 K maximal values of 1250 cm2 V-1s-1 and hole mobility maximal values of 400 cm2 V-1s-1 in bulk (intrinsic) crystalline silicon. For electron mobility at temperature of 300 K in a-Si:H intrinsic silicon layers in simulation values are modeled as follows: (MUN) 10-20 cm2V-1s-1, (MUP) 2-4 cm2V-1s-1. For a-nc-Si:H thin film layers values are: (MUN) 100-250 cm2V-1s-1, (MUP) 8-60 cm2V-1s-1; for nc-Si:H layers (MUN) 400-650 cm2V-1s-1, (MUP) 100-180 cm2V-1s-1, for nc-Si:H/c-Si:H layers (MUN) 800-1000 cm2V-1s-1, (MUP) 200-300 cm2V-1s-1; and for c-Si:H layers (MUN) 1200-1250 cm2V-1s-1, (MUP) 300-400 cm2V-1s-1. 12 10 current density Jsc efficiency (%) 8 6 0 200 400 600 800 1000 1200 1400 absorber thickness (nm) Figure 6. Graphical presentation of calculated solar cell current density JSC (mA/cm2) and efficiency () in case of homogeneous absorber with crystal fraction of 30%. Second pin model solar cell design implies experimentally proven structure in-homogeneity of silicon CVD thin films. Absorption coefficient decreases drastically through structure instead of increasing of free carrier mobility according to tissue structure changes. As a result of different optical and electrical properties in the structure layers of the in-homogenous solar cell calculated performance is expectable. Optimum efficiency of =12.32% is reached at 492 nm (Fig.7.) and current density of JSC=12.21 mA/cm2 exists at the same absorber thickness. 16 15 14 J SC (mA/cm2); efficiency  (%) D. 13 12 11 current density Jsc 10 y = 2E-08x3 - 4E-05x2 + 0,0307x + 5,8456 efficiency 9 Poly. (efficiency ) 8 7 6 0 200 400 600 800 1000 1200 1400 absorber thickness d (nm) Figure 7. Graphical presentation of calculated solar cell current density JSC (mA/cm2) and efficiency  () in case of inhomogeneous absorber presented with 9 different absorber layers. MIPRO 2016/MEET In Figure 7 additional curve (black line) represents third degree Polynomial of efficiency curve suggesting efficiency calculation in any absorber thickness (absorber dimension x). VII CONCLUSION In this study we have simulated two types of single junction solar cells with absorbers of a-nc-Si:H and aSi:H/nc-Si:H/c-Si:H tissues. A series of simulations were carried out in order to calculate the efficiency of modeled solar cell by varying the properties each of layers in the range published in the literature. The obtained results show clearly that the cells reach its optimum efficiency at different thicknesses. In case of homogenous silicon a-nc-Si:H tissue expected efficiency is around 14% and this value is in correlation with crystal fraction, absorption coefficient and free carrier mobility, respectively. In case of in-homogenous a-Si:H/ncSi:H/c-Si:H absorber expected efficiency is around 12% and lower strongly depending of structure characteristic. The observed silicon layers specificity in the optical and electrical properties can be explained as a consequence of thin film deposition techniques forming regions of nano and micro crystals with arbitrary concentrations in amorphous matrix to determine free carrier transport model. However, the goal of this work is quite ambitious in its choice to enable a preventive efficiency calculation of composite silicon solar cells before its production if deposition techniques are known, constant and stabile. Building of data matrix concerning arbitrary concentrations of crystals and thickness of deposition layers it could be possible to predict solar cell efficiency using for example a principal method and third degree Polynomial approximation such as the one derived here. REFERENCES [1] D. Gracin, K. Jurajić, I. Djerdj, A. Gajović, S. Bernstorff, V. Tudić, M. Čeh, “Amorphous-nanocrystalline silicon thin films for single and tandem solar cells”, 14th Photovoltaic Technical Conference - Thin Film & Advanced Silicon Solutions, June, 2012, Aix en Provence, France. [2] V. Tudić, “AC Impedance Spectroscopy of a-nc-Si:H Thin Films”, Scientific Research Engineering, July, 2014, vol. 6, No. 8, pp. 449-461. doi: 10.4236/eng.2014.68047. [3] J. Sancho-Parramon, D. Gracin, M. Modreanu, A. Gajovic, “Optical spectroscopy study of nc-Si-based p-i-n solar cells“, Solar Energy Materials & Solar Cells 93, 2009, pp. 1768-1772. [4] D. Gracin, A. Gajović, K. Juraić, J. Sancho-Parramon, M. Čeh: “Correlating Raman-spectroscopy and high-resolution transmission-electron-microscopy studies of amorphousnanocrystalline multilayered silicon thin films“; Thin Solid Films 517, 2009, vol. 18, pp. 5453-5458. [5] A. M. Ali, “Origin of photoluminescence in nanocrystalline Si:H films“, Journal of Luminescence, 2007, vol. 126, pp. 614622. [6] A. V. Shah, J. Meier, E.Vallat-Sauvain, N. Wyrsch. U. Kroll, C. Droz, U. Graf, “Material and solar cell research in microcrystalline silicon“, Solar Energy Materials & Solar Cells, 2003, vol. 78, pp. 469-491. [7] A. M. Ali, “Optical properties of nanocrystalline silicon films deposited by plasma-enhanced chemical vapor deposition“, Optical Materials, 2007, vol. 30, pp. 238-243. MIPRO 2016/MEET [8] S. Hazra, S. Ray, “Nanocrystalline silicon as intrinsic layer in thin film solar cells, Solid State Commun., 1999, vol. 109, pp. 125-128. [9] D. Gracin, A. Gajović, K. Juraić, J. Sancho-Parramon, M. Čeh: “Correlating Raman-spectroscopy and high-resolution transmission-electron-microscopy studies of amorphousnanocrystalline multilayered silicon thin films“, Thin Solid Films, 2009, vol. 517, Is. 18, pp. 5453-5458. [10] Kočka, J., Stuchlíkova, H., Stuchlík, J., Rezek, B., Mates, T., Švrcek, V., Fojtík, P., Pelant, I., Fejfar, A., “Microcrystalline silicon - relation of transport and microstructure“, Solid State Phenomena, 2001, Vol. 80-81, pp. 213-224. [11] Kočka, J., Stuchlíkova, H., Stuchlík, J., Rezek, B., Mates, T., Švrcek, V., Fojtík, P., Pelant, I., Fejfar, A., Model od transport in Micro-crystalline silicon“, Journal of Non-Crystalline Solids, 2002, Vol. 299-302, pp. 355-359. [12] Moutinho, H.R., Jiang, C.-S., Perkins, J., Xu, Y., Nelson, B.P., Jones, K.M., Romero, M.J., Al-Jassim M.M., “Effects of dilution ratio and seed layer on the crystallinity of microcrystalline silicon thin films deposited by hot-wire chemical vapor deposition“, Thin Solid Films, 2003, Vol. 430, Issues 1–2, pp. 135–140. [13] A. Belfar, R. Mostefaoui, “Simulation of n1-p2 Microcrystalline Silicon Tunnel Junction with AMPS-1D in a-SiC:H/c-Si:H Tandem Solar Cell, Journal of Applied Science, 2011, pp. 10.3923. [14] S. Tripati, R. O. Dusane, “AMPS-1D simulation studies of electron transport in mc-SI:H thin films“, Journal of NonCrystalline Solids, 2006, vol. 352, pp. 1105-1108. [15] R. H. Franken, R. L. Stolk, H. Li, C. H. M. Van der Werf, J. K. Rath, R. E. I. Schropp, “Understanding light trapping by light scattering textured back electrodes in thin film n-i-p-type silicon solar cells“, Journal of Applied Physics, 2007, vol. 102, pp. 014503. [16] D. Stieler, V. D. Dalal, K. Muthukrishnan, M. Noack, E. Schares, “Electron mobility in nanocrystalline devices“, Journal of Applied Physics, 2006, vol. 100, doi: 10.1063/1.2234545. [17] B. Van Zeghbroeck, “Mobility Carrier Transport, Principles of semiconductor devices“, ECEE University of Colorado, 2011. [18] V. Tudić, “Modeling of Electric characteristics of the photovoltaic amorphous-nanocrystalline silicon cell“, doctoral thesis, FER Zagreb, 2014, 245 pp. [19] K. Shimakawa, “Photo-carrier transport in nanocrystalline silicon films“, Journal of Non-Crystalline Solids, 2006, vol. 352, Issues 9-29, pp. 1180–1183. 29 The investigation of influence of localized states on a-Si:H p-i-n photodiode transient response to blue light impulse with blue light optical bias # Marko Čović#, Vera Gradišnik# and Željko Jeričević* Engineering Faculty/Department of Electrical Engineering, Rijeka, Croatia Engineering Faculty/Department of Computer Engineering, Rijeka, Croatia * zeljko.jericevic@riteh.hr Abstract - The series of experiments measuring the transient response of a-Si:H pin photodiode to light impulses superimposed to constant light (optical bias dependence of modulated photocurrent method - OBMPC) of the same wavelength (430 nm) and various reverse voltages on photodiode was performed in order to characterize localized states of the energy gap of amorphous silicon and their influence on photocurrent degradation. The responses were analyzed as a sum of decaying exponential functions using the least squares method and a generalized Fosse's algorithm. This type of response is typical for independent relaxation processes running at the same time. Experiments and subsequent data processing illustrate feasibility of the method and results for the transient response of a-Si:H pin photodiode. The results strongly suggest two energy levels between 0.32 eV and 0.45 eV. These results were obtained applying the optical ac blue and dc blue bias light in a low frequency regime. I. INTRODUCTION Exponential decay is typical for a single relaxation processes in physics and first order chemical reactions in chemical kinetics, as well as in biology. In more complex situations where few independent processes of this type are going on in parallel, the summary measurements from the system consist of the sum of exponential functions. For example, in a mixture of radionuclides, each one decays independently and with a different rate controlled by its half life. Although the separation of exponentials from summary signal looks deceptively simple, it is actually a tough numerical problem because of the nonorthogonality of exponentials. Attempts to separate exponentials with close half times iteratively by nonlinear least squares usually results in large number of iterations and no convergence. For the analysis reported here we used the least squares method with a linearization step based on numerical integration. After the linearization, the solution of the multi-exponential problem is obtained by solving an over-determined system of linear equations followed by finding the roots of polynomial. The number of exponentials in the signal dictates the degree of polynomial, rank of the linear system, and multiplicity of numerical integration. The advantage of accurate linearization is that the separation of exponentials becomes a noniterative procedure and the condition number of the linear system can be used to control the 30 quality of solution. The procedure is completely generalized and initial guesses are not necessary. It is also simple to implement non-negativity constrains on a solution. The complex nature of localized states, such as native and metastable defects in a-Si:H [1], has an influence on a-Si:H p-i-n photodiode transient response. Other authors used the multiexponential trapping rate and modulated photocurrent (MPC) technique [2] to determine parameters of localized states throughout the entire energy gap by employing frequency and temperature scans. We examined the nature and the kinetics of light-induced defects creation in a-Si:H films and photodiode and their influence on photocurrent degradation. We measured and analyzed the transient response of a-Si:H p-i-n photodiode to blue light impulse superimposed to the blue light optical bias (optical bias dependence of modulated photocurrent method – OBMPC [2, 3]) at various reverse bias voltages and one frequency. By means of OBMPC, the trap and recombination localized states parameters throughout the entire energy gap can be identified. To the low-frequency MPC data the deeper recombination centers also contribute [2]. The purpose of this work is to identify the nature and role of trapping and recombination process of mobile carriers. This was done under condition of bias and modulated blue light of weak illumination intensity, at low modulation frequency and at applied low reverse bias voltages on a-Si:H p-i-n photodiode. In this regime, the localized states with deeper energy levels in low frequency regime can be identified. Based on experimentally obtained results, the photodiode transient responses are reconstructed with the help of numerical modeling. Under described experimental conditions, the measured transient responses show the presence of one or two decaying exponential functions corresponding to two energy levels between 0.32 eV and 0.45 eV. II. THEORY AND RESULTS The basic idea of the computing method was first proposed by Foss [4], and was later used by Matheson [5] in chemistry and Jericevic [6] in biology. In all above mentioned papers the method was developed and applied to special cases, no general solution was developed, MIPRO 2016/MEET although Foss claimed it is possible to construct one. The general solution was finally developed by Jericevic [7], and it does represent the foundation on which here described data processing is based. The approach outlined here offers unprecedented flexibility addressing important problems of accurate and fast (real time) processing for multiple models (for an arbitrary number of exponential terms) using the same computing methodology. Previous implementations required that specialized subroutines for each individual case (mono-exponential, bi-exponential, tri-exponential, etc) have to be written. The major steps in out methodology are: 1. 2. 3. 4. 5. Linearization by numerical integration method Solution of linear system of equations with or without non-negativity constrains Determining coefficients and roots of polynomial equation(s) based on linear system solutions. Decay constants computed in the previous step are used to compute pre-exponential terms. Testing and verifying the results by detailed error analysis. The detailed development of a general solution is presented in [7] and here we are giving a brief description of the final result only: y = G e n e r a l s o lu t i o n f o r N ∑ Ai e − k t i i =1 N p1 = −∑ ki (1) i =1 pn = − N! ( N − n )!n! n ∑ ∏k i =1 1 m ∈ ( N Cn ) m N p N = −∏ k i (2) ( 3) i =1 N pN +1 = ∑ Ai ( 4) i =1 i= N pN + 2 = ∑ Ai i =1 pN + n = N ∑ j =1; j ≠ i 1 ∑ Ai (n − 1)! i =1 N ( 5) kj ( N −1)! ( N − n )!( n −1)! ∑ j =1; n −1 ∏k 1 m m ∈ ( N −1Cn −1 ; m ≠ i) MIPRO 2016/MEET ( 6) p2 N = N N 1 A ∑ i ∏ kj ( N − 1)! i =1 j =1; j ≠i (7) Where vector of parameters p is a solution of linear system based on consecutive N-fold numerical integration of multi-exponential equation in time t (N is the number of exponential terms in the summary signal y). From the first N polynomial equations in k (vector of decay constants) is calculated. This set of polynomial equations are presented in brief as (1) to (3). After vector k is known, the computing of parameter vector A (vector of population of states) from the last N equations (4) to (7) becomes a linear problem. The details and complete development of the solution is in [7]. Our rationale for using the general multi-exponential solution is that we did not postulate the number of components (N) in advance. Instead, we let the data determine N by fitting with an increasing number of exponentials and subsequently used the best fit for photodiode characterization. The a-Si:H p-i-n photodiode, due to the presence of localized states in the energy gap, shows the multiexponential decay in transient response on light pulse when the light is taken off. The number of components included in response was not known in advance. In our experiments, the a-Si:H p-i-n photodiode is illuminated with monochromatic blue LED (Kinghbright FULL COLOR RGB LAMPS, 430 nm, IF= 20 mA) light consisting of a constant (bias) and pulsed (probe) light beams of the same relatively weak intensity and low frequency. Consequently, the electron-hole pair photogeneration rate contributes to a constant and a pulsed (transient) mobile carrier density component. For blue light pulse and bias illumination, electron-hole pairs are generated near the front surface. Our sample of a-Si:H p-in is illuminated from p-type layer electrons. We assume that the majority carriers are in i-type layer. Due to trapping and release interaction of free carriers with band gap localized states, the resulting transient photocurrent will have time delay with respect to the light excitation. Also, the photocurrent decay will happen long before the photocurrent reduction which is due to free carrier recombination. The carrier from shallow trap are reemitted soon after capture, but the carriers in a deep trap reside longer, are practically lost, and are the major cause for the transient photcurrent decay (base-line current tail). The transition of mobile carriers, detected in the experiment, relies only on the exchanges between localized gap states and extended states. The hole, at blue light absorption, move directly into the front contact and their contribution to the transient photocurrent is small [3]. The dc illumination determines the position of quasiFermi level for electrons, Etn and holes, Etp and can be deduced from the measurement of the dc photocurrent from [8, 9] EC − Etn = kT ⋅ ln( μ n N C Aqξ / I phdc ) (8 ) where is the mobility µp = 10 cm2V-1s-1, the effective density of states Nc = 1020 cm-3, ξ is bias voltage dependent electric field, and attempt-to-escape frequency ν0 =1⋅1012 s-1. The time response characteristic for dc part 31 Figure 1. Measured and calculated switch-off transient response of aSi:H p-i-n photodiode on blue light pulse at blue bias light at 1.5 V bias reverse voltage. Measured is a summary signal (experimental), Theory is fitted function, Energies are fitted components and Measured – Theory is a difference between experimental measurements and fitted function. Figure 2. Energies of two energy levels obtained from measured transient responses of a-Si:H p-i-n photodiode by blue light pulse at blue bias light at reverse bias voltages from 0 V to 2 V. i is close to Etn , E ωi n ≈ E tni . In region where ω << ω ci and ω are comparable to frequency regime. ω ci is the so called low of generation rate of the gap state at the energy E is given by [9] 1/τ(E) = cnndc + cppdc + en(E) + ep(E), where ndc, pdc are free electron and hole density, cn, cp capture coefficients of electrons, holes, en(E), ep(E) emission frequency toward the conduction, valence band. The occupation function of the gap states fdc at constant illumination change from 1 to 0 in two steps which occurring at two quasi-Fermi levels of trapped carriers, Etn and Etp, at which the emission of electron and hole are equal to characteristic capture frequency, ωc. The energy gap divide, dependently on localized states energy positon, two quasi-Fermi levels, in electron trapping states (E > Etn), hole trapping states (E < Etp) and recombination states (Etp < E < Etn). All the states have characteristic response time shorter than the period of the ac signal. The phase shift is low and is not induced by trapping-and-release events of the MPC. The energies fall in recombination centers and the information on gap states at Etp and Etn is done. In the low frequency (LF) regime the DOS (Density Of States) at the quasi Fermi level of trapped carriers is as we propose the modified expression from one done in [9] At bias voltage Vi, the characteristic frequency, ω c , where the delay time td and pulse period Tp, Gdc the dc generation rate, kB Boltzmann constant, T temperature. (9) The transient photocurrent decay is due to transit time, which is voltage dependent. Due to localized states the transit time has two component, corresponding to two energies, and expressed [3] as i which is the capture rate of electrons and holes into each type or probed gap state is given by [2] ωci = ndc cni + pdc cip where ndc (pdc) are free carrier electron (hole) density and cni (c ip ) the capture coefficient for an electron (hole) changes. The applied reverse bias voltage scan provides the spectroscopy of the gap states instead of temperature scan. To characterize the transient behavior two other energies have to be introduced: the characteristic time response of the gap states and the characteristic time, which is the period of the ac signal. In the MPC (Modulated PhotoCurrent method) experiment the ac behavior is characterized with two other energies, comparing the inverse of characteristic time response of the localized gap states 1 / τ( E ) and the angular frequency ω of the ac signal. If ω<ωc than 1/τ(E)> ωc>ω. The distribution energy described by relation from [2, 8] EC − Eω n = kT ln(ν 0 / ω ) 32 (10 ) N ( Etn ) ≈ Gdc td kBT (Tp / 2) ⎛ d tt = t0 + ∑ N ( Ei )σ vth ⎜ i ⎝ μnξ (11) ⎞ −1 ( Ei / kT ) ⎟ν i e ⎠ (12 ) where time that electron spend in conduction band is t0, thermal velocity vth, electron capture cross section σ, discrete localized states N(Ei) at Ei energy levels, electric field ξ, electron mobility µn, intrinsic layer width d. The a-Si:H p-i-n photodiode structure used in our experiment is well described in [10] and their transient response on light pulse of blue, green and red light at 2V reverse bias in [11]. The p-i-n structure was deposited on a transparent conductive oxide (TCO) coated glass from undiluted SiH4 by plasma-enhanced CVD and is as follows: glass/TCO/ptype (5 nm)/i-type (300 nm)/n-type (5 nm)/Al back contact as described in [10]. The n-type layer was made by adding phosphine and the p-type by adding diborane to the gas mixture. The back contact was aluminum deposited by the evaporation. The active surface area of the photodiode was 0.81 cm2. Photo-illumination was MIPRO 2016/MEET term of the deeper gap states and those nearer midgap states was higher than those of the shallower energy levels. These results agree with those obtained by oher authors [2], where the capture coefficients of the gap states closer to the midgap were higher than those of the shallow energy levels. III. Figure 3. Preexponential terms obtained from measured transient responses of a-Si:H p-i-n photodiode by blue light pulse at blue bias light at reverse bias voltages from 0 V to 2 V. obtained through the bottom p-type layer. The transient response of a-Si:H photodiode was measured as a response to the light pulses superimposed to the constant light (optical bias dependence of modulated photocurrent method - OBMPC) of the same wavelength of 430 nm and at reverse voltages from 0V to 2V. Samples were measured at the room temperature. Photocurrent was measured directly on the 10 kΩ load resistor. The two blue (B) light LED for the probe (ac) and the pump (dc bias) from Multicolor LED lamp were used in the experiments, emitting at 430 nm. The emitted photons are of energies higher than the band gap energy. The optical light powers were defined at 20 mA LED bias current and the probe pulse period 3 ms with 50% duty cycle. The measured a-Si:H photodiode switch-off transient response on a 3 ms light pulses of blue light at bias blue light and at 1.5 V reverse bias is shown in Fig. 1. The responses at bias voltages of 0 V, 0.5 V, 1 V, 1.5 V and 2V were analyzed as a sum of decaying exponential functions using the least squares method and a generalized Fosse's algorithm as described above. From Fig.1 it is evident that for 1.5 V reverse bias voltage, the states with shallower energy (E1 = 0.4397 eV) and corresponding preexponential term 2.133·10-6 have a small contribution to total photocurrent. The deeper energy state (E2 = 0.4492 eV) with a 1.2886·10-5 preexponentail term prevails in the transient response. It can be concluded that second energy states behave as deep acceptor localized states. The calculated value of DOS is 5·1014 cm-3eV-1 centered at EC-Ei=0.45 eV with EC-Etn=0.66 eV and EC-Eωn=0.5 eV. The calculated energies of localized energy levels are presented in Fig. 2. The summary measurements from the system consist of the sum of two exponential functions in all the cases, as shown in Fig. 1. The pre-exponential factor yields the information about presented species of localized states is shown in Fig 3. The preexponentail MIPRO 2016/MEET CONCLUSION Our experiments indicate that the used frequency of modulated light is lower then the critical frequency and therefore fall in a low frequency regime. The experiments are done using the same light intensity for biased and modulated light. Under those conditions the trap centers act as a recombination centers and influence the a-si:H pi-n photodiode transient response. The results of OBMPC experiment have been collected and analyzed in time domain to characterize the behavior of a-Si:H p-i-n photodiode. For the selected light pulse frequency and the low bias voltages on photodiode, the LF regime can be used to determine DOS values and energy levels. Our results show the presence of two energy levels and their influence on a-Si:H p-i-n transient response on blue light pulses at bias blue light. The preexponentail term of the deeper gap states and those nearer midgap states was higher than those of the shallower energy levels. These results agree with those obtained by other authors [2], where the capture coefficients of the gap states near midgap found higher than those of the shallow energy levels. ACKNOWLEDGMENT We thank the referees whose comments improved the paper and DKJ for helping with English REFERENCES Melskens, J. et al., IEEE J. Photovolt., 6 (2014) 1331-1336 Year: 2014, Volume: 4, Issue: 6 Pages: 1331 - 1336, DOI: 10.1109/JPHOTOV.2014.2349655 [2] Pomoni, M., Kounavis, P., Phil. Mag., 94:21, (2015) 2447-2471. [3] Shen, D.S., Wagner, S. , J. Appl. Phys. 79, (1996) 794 – 801. [4] Foss, S.D., Biometrics, 26 (1970) 815-821. [5] Matheson, I.B.C., Anal. Instr., 16 (1987) 345-373. [6] Jericevic, Z. et al., Adv. Cell Biol., 3 (1990) 111-151. [7] Jericevic, Z. “Method for Fitting a Sum of Exponentials to Experimental Data by Linearization Using Numerical Integration Approximation, and Its Application to Well Log Data”, USP #7,088,097, 2006. [8] Kounavis, P. , J. Appl.Phys., vol. 77, (1995) 3872-3878. [9] Kleider, J.-P. et al., Phys. Stat. Sol.C, 5 (2004) 1208-1226. [10] Gradisnik, V. et al., IEEE Trans. Electron Devices, 49, (2002) 550 - 556. [11] Gradisnik, V et al., IEEE Trans. Electron Devices, 53, (2006) 2485 – 2491. [1] 33 Analysis of Electrical and Optical Characteristics of InP/InGaAs Avalanche Photodiodes in Linear Regime by a New Simulation Environment Tihomir Knežević and Tomislav Suligoj University of Zagreb, Faculty of Electrical Engineering and Computing, Micro and Nano Electronics Laboratory, Croatia tihomir.knezevic@fer.hr Abstract - The linear characteristics of the InP/InGaAs avalanche detectors are modeled and numerically analyzed by developing a new TCAD-based simulation environment. Temperature dependency of the impact ionization coefficients in InP are fitted for 200 K to 300 K temperature range. Adjustment of the model parameters for the simulations of the dark current sources in InP and InGaAs materials is performed in the same temperature range. Optical constants of the InGaAs material used in the layer stack are fitted to account for the absorption in the material for a range of wavelengths between 0.9 and 1.7 µm. Dark current and I-V characteristics under illumination are simulated and analyzed. Impact of the operating temperature on responsivity, breakdown voltage and dark current are analyzed. Excess noise factor is also calculated. Process simulations of Zn diffusion into InP are included in the TCAD simulator and the impact of the real diffusion profiles on the diode characteristics are assessed. The dark current for the structure with diffused Zn p+ region decreases by a factor of 1.7 compared to the structure with box-like constant concentration p+ region extracted at operating temperature of 200 K at 90% of VBR. I. INTRODUCTION Low-light detection in the near-infrared range is commonly achieved by InP/InGaAs heterostructures employed in separate absorption, grading, charge and multiplication (SAGCM) avalanche photodiodes (APD). Avalanche photodiodes can be operated in linear or “Geiger” mode. In comparison to the standard pin photodiodes, InP/InGaAs APDs operated in linear mode have a higher sensitivity which promotes them to a device of choice for the optical communication systems [1]. Photodiode detectors operated in “Geiger” mode, above breakdown voltage, are called single-photon avalanche diodes (SPADs). Single-photon detection in the wavelength range above 1 µm is important for quantum cryptography [2], eye-safe laser ranging (Light Detection And Ranging – LIDAR) [3], time-resolved spectroscopy, photon-counting optical communication [4], etc. In “Geiger” mode, photogenerated carriers can produce selfsustaining avalanche. Avalanche is stopped by using a quenching circuit [5] and a SPAD is ready to detect a new photon. In SAGCM structures, InGaAs (In0.53Ga0.47As) is used as an absorbing region and its lattice is matched to the one 34 of InP. Bandgap of the InGaAs layer is 0.75 eV at room temperature and the layer is used for detection of light with wavelengths in range between 0.9 and 1.7 µm. SAGCM structure provides a way to limit the dark current due to the tunneling in a narrow bandgap InGaAs layer. High-field multiplication region is located in the low-doped InP layer while the charge layer limits the spread of the electric field to the absorption region. Reduced tunneling and impact ionization in the InGaAs layer contribute to the decreased dark current and improved overall performance of the photodetector. Thermally generated carriers, tunneling currents and background photons all give rise to dark current, which decreases the sensitivity of the APD operated in the linear mode. The same is true for SPADs where the unwanted carriers can trigger an avalanche. This introduces noise in the operation of the SPAD called Dark Count Rate (DCR). Device structure e.g. layer stack thicknesses and doping concentrations can impact the performance of the APDs by changing the magnitude of the dark current coming from trap-assisted tunneling (TAT) [6]-[8]. On the other hand, thermally generated dark current is commonly reduced by decreasing the operating temperature of the InP/InGaAs APDs. The key element in designing a high performance APDs is the ability to predict the device behavior for different structure parameters. There are simulator environments and analytical models capable of analyzing and simulating optical and electrical characteristics of the InP/InGaAs photodiodes [6], [9]-[13]. However, none of them exploits the functionality of a TCAD software to model the linear InP/InGaAs APD characteristics in the 200 to 300 K temperature range for realistic p+ region doping profiles. A new simulation environment using Sentaurus TCAD is developed. This TCAD-based environment is capable of simulating electrical and optical characteristics of the InP/InGaAs APDs. In this paper, fitted TCAD models enable the simulations of the avalanche generation in InP, thermal generation in InP, InGaAs and InGaAsP and trapassisted tunneling in InP. Process simulations of Zn diffusion into InP are also implemented in the simulation environment. Optical and electrical simulations of the active part of the structure proposed by Liu et al. [6] are performed for temperatures of 200 K and 300 K. Excess noise factor is simulated for the same structure. Impact of MIPRO 2016/MEET the realistic p+ region diffusion profiles on the electrical characteristics is also assessed. II. DEVICE SIMULATIONS A. Fitting of the physical model parameters for InP/InGaAs APD device simulations Commercially available Sentaurus TCAD software from Synopsys is used for the simulations of the InP/InGaAs structure. The TCAD software is capable of performing both device and process simulations. Device simulator [14] can be used for electrical, optical and thermal simulations of the user-defined geometry with different materials. However, simulator provides full functionality of all the models for Si materials. In order to use the physical models for simulations of InP, InGaAs and InGaAsP materials for low temperatures, the physical model parameters should be fitted and properly tested. Doping profile of the analyzed InP/InGaAs APD structure is shown in Fig. 1 (a). The parameters of the structure such as doping concentrations and layer thicknesses are almost identical as those in [6]. Buffer layer with thickness of 2 µm is defined on top of the bulk n+ InP region. Buffer layer is followed by the InGaAs absorption region with thickness of 3 µm. Between the InGaAs layer and InP there is a InGaAsP grading layer, which serves to improve the transient characteristics of the device. Field stop or charge region with thickness of 1.2 µm and doping p InP 19 10 18 10 17 10 16 10 15 10 14 Absorption region p+ InP Buffer layer p+ InP n- InP n InP InGaAsP -3 Doping concentration (cm ) 10 Charge region Grading Multip. (a) + Bulk InP n+ InP n- InGaAs n InP n+ InP bulk n InP concentration of 2.1 · 1016 cm-3 is used to limit the spread of the electric field into the InGaAs layer. Low-doped n- InP layer is defined on top of the charge layer. P+ region is defined as a highly doped box-like p+ region with the constant doping concentration of 2 · 1018 cm-3, in the first approximation. Multiplication region is defined to be 0.5 µm thick. Band diagram of the structure at 0 V, 300 K is shown in Fig. 1 (b). The difference in the bandgap of the InP and InGaAs materials causes valence and conduction band discontinuities. Valence band discontinuity can limit the speed of the device since the generated holes have to overcome the potential barrier. Grading layer decreases the valence discontinuity increasing the response speed of the device. In order to obtain the current-voltage characteristics of the InP/InGaAs diode, the impact ionization coefficients for InP material must be fitted first. TCAD simulator provides different models capable of modeling impact ionization coefficients in wide electric field and temperature range. However, the temperature dependency of the used Okuto-Crowell model for impact ionization in InP could not be obtained just by fitting the appropriate coefficients. Therefore, the parameters for the electron and hole impact ionization coefficients a, b and δ are fitted in the different temperature steps to the impact ionization values from the analytical model from [10]. In [10] the analytical expression was constructed to obtain best fit to the measured data and represents a quasi-physical model. The comparison of the fitted impact ionization coefficients for holes and electrons to the values obtained by the analytical model for temperatures of 200 K and 300 K is plotted in Fig. 2. Excellent fit to the experimental data can be achieved. Using the fitted impact ionization coefficients, the current-voltage characteristics of the structure proposed in Fig. 1 are simulated. Extracted breakdown voltage (VBR) at 300 K is 79 V. Measured VBR for the same structure is 75 V [6] which is in good agreement to the simulations. InGaAsP n InP n- InGaAs n- InP 0 1 2 3 4 5 6 7 8 9 10 Depth (µm) p InP 2.0 Charge region Grading Multip. (b) + Absorption region Buffer layer Identifying the dark current sources at different temperatures is very useful for proper modelling and optimization of the electrical and optical characteristics of APDs operated in linear mode. Current-voltage characteristics of the linear APDs contain the information on the dark current sources. Fitting the model parameters of TAT for InP and SRH for InP, InGaAs and InGaAsP for various temperatures is a difficult task due to the lack of the Bulk InP 10 EC 1.5 10 4 10 3 10 2 10 1 10 0 -1 Ionization rate (cm ) 0.5 EF 0.0 EV -0.5 -1.0 -1.5 electrons 200 K -2.0 0 Okuto-Crowell fit Holes: T=300 K T=200 K Electrons: T=300 K T=200 K holes 1.0 Energy (eV) 5 1 2 3 4 5 6 7 8 9 10 Depth (µm) Figure 1. Cross-section of the simulated structure. (a) Doping profile of the SAGCM InP/InGaAs APD. (b) Band diagram of the structure at 0 V; 300 K. MIPRO 2016/MEET 2.0 Analytical model Holes: T=300 K T=200 K Electrons: T=300 K T=200 K 2.5 300 K 200 K 300 K 3.0 3.5 4.0 4.5 1/E (cm/MV) Figure 2. Fitted Okuto-Crowel impact ionization coefficients for electrons and holes for temperatures of 200 K and 300 K. 35 B. Electrical and optical analysis of the InP/InGaAs APD The fitted parameters for the dark current generation are then used in current-voltage simulations of the structure depicted in Fig 1. Simulation results of the current-voltage characteristics for temperatures of 200 K and 300 K are shown in Fig. 3. For both temperatures, the majority of the dark current is originating in InP. The contribution to the dark current from InGaAs and InGaAsP starts at approximately 40 V, which is the punchthrough voltage when those regions become fully depleted. Contribution to the dark current from InGaAsP is negligible at both temperatures. For the temperature of 300 K, the contribution to the total dark current from the InGaAs is comparable to the contribution to the dark current from InP. On the other hand, the contribution of the dark current from InGaAs at 200 K is almost two orders of magnitude lower than the contribution to the dark current from InP. Furthermore, there is an increase of the dark current coming from InP that starts at around 35 V. The origin of this dark current is TAT from InP layer. These results are in agreement to the results reported in the literature where the dominant mechanism that determines the DCR at lower temperatures is TAT. TAT is caused by the high electric field in the multiplication region and can be controlled by decreasing the trap concentration in the region or by adjusting the geometry of the device and increasing the multiplication region thickness. Optical simulations of the structure are performed and the results of the current-voltage characteristics are shown in Fig. 4 (a). Simulations are done for temperatures of 200 K and 300 K. Complex refractive index for 2 Current density (A/µm ) -10 10 -11 10 -12 10 -13 10 -14 Total 10 -15 10 T = 200 K -16 10 Total -17 InP 10 InGaAs -18 10 InGaAsP -19 10 -20 10 Total -21 10 -22 10 20 30 T = 300 K InP InGaAs InGaAsP InP T = 300 K Total InP InGaAs InGaAsP InGaAs InGaAsP T = 200 K 40 50 60 70 80 Reverse voltage (V) Figure 3. Current-voltage characteristics of the proposed InP/InGaAs APD structure at temperatures of 200 K and 300 K. Symbols: total current; lines: contribution to the dark current from InP, InGaAs and InGaAsP. 36 (a) 2 Current density (A/µm ) -8 10 -9 10 -10 10 -11 10 -12 10 -13 10 -14 10 -15 10 -16 10 -17 10 -18 10 -19 10 -20 10 -21 10 -22 10 Illuminated Optical generation λ = 1.5 µm -3 2 I = 10 W/cm T = 300 K T = 200 K T = 300 K Dark current Dark current T = 300 K T = 200 K T = 200 K 0 10 20 30 40 50 60 70 80 Reverse voltage (V) (b) 10 2 Gain T = 300 K T = 200 K 10 1 10 0 T = 200 K T = 300 K Gain experimental data. However, there are plenty of literature sources describing the impact of these dark current sources on the DCR of SPADs at various temperatures. We used numerical computation to calculate probabilities that a hole or an electron will initiate an avalanche in [15]. Electron and hole avalanche probabilities are used together with the carrier generation rate profiles obtained from TCAD simulations to calculate DCR. Parameters of the TCAD models for SRH from InP, InGaAs and InGaAsP and TAT from InP are fitted and their contributions to DCR show excellent agreement for various temperatures and device geometry parameters as compared to the measured and simulated data. 10 -1 10 -2 20 30 40 50 60 70 80 Reverse voltage (V) Figure 4. (a) Current-voltage and (b) gain characteristics of the illuminated InP/InGaAs APD for temperatures of 200 K and 300 K. In0.53Ga0.47As material is not available in the simulator so the fitting to the complex refractive index from [16] is performed. The structure is exposed to the light with wavelength of 1.5 µm and the intensity of light is 10-3 W/cm2. For both temperatures, the punchthrough voltage is around 40 V. On the other hand, the breakdown voltage changes from 58 V to 79 V for the temperatures of 200 K and 300 K, respectively. The punchthorugh voltage of 60 V is reported in [6]. The difference can be attributed to the variations of the doping profiles and thicknesses of the multiplication region. Current-voltage characteristics of the illuminated structure is used to calculate the gain characteristics of the InP/InGaAs diode. Unity gain is defined to be at the diode voltage where the quantum efficiency reaches 80 %. Gain characteristics are depicted in Fig. 4 (b). Since the punchtrough voltage is almost the same for both temperatures and the breakdown voltage is smaller at 200 K, the multiplication gain increases more swiftly for the device operated at 200 K. C. Excess noise characteristics Statistical fluctuation of the avalanche process generates noise in the electrical current. In the operation of the linear device, knowledge on the noise that the APD introduces in the electronic system is of utmost importance. Analytical expression for calculation of the excess noise factor (F) is derived in [17]:  =  1 − (1 − ) ∙ ሺெିଵሻమ ெమ , (1) MIPRO 2016/MEET where M is multiplication gain and k is the ratio of the maximum ionization coefficients of electrons and holes. Excess noise factor is calculated for InP/InGaAs APD structure for temperatures of 200 K and 300 K and is depicted in Fig. 5. Multiplication gain as a function of voltage is determined by the exposure to the light with wavelength of 1.5 µm and intensity of 10-3 W/cm2. Ionization coefficient profiles for electrons and holes are extracted from 1D simulations for the same bias voltages. Excess noise factors at M = 10 are 4.3 and 4.6 for temperatures of 200 K and 300 K, respectively. Using the APD diode at lower temperatures improves the overall sensitivity of the device for optical detection due to the decrease of the dark current. Lowering the temperatures also decreases the excess noise factor. Other than the decrease of the temperature, excess noise factor depends on the structure of the layer stack of the APD, which is not analyzed in this paper. III. PROCESS SIMULATIONS A. Fitting of the Zn diffusion model parameters Zn diffusion is commonly used in the formation of the pn-junctions in InP/InGaAs APDs. Process simulations of the Zn diffusion into InP are also added to the TCAD simulation environment. Currently, Zn diffusion into InP is not modeled in the Sentaurus Process [21]. Therefore, we performed calibration of the diffusivity models in order to simulate the Zn diffusion. Device simulations using the Zn diffusion profiles are important for obtaining realistic electrical and optical characteristics of APDs. Process simulations of the Zn diffusion along with device simulations can also be used in the design of the guard rings of InP/InGaAs APDs. SIMS profiles of Zn diffusion into InP are reported in various literature sources [18]-[20]. Mechanisms governing Zn diffusion and the diffusion models are discussed in [18]. The diffusion is dominated by interstitial-substitutional mechanism where Zn diffuses as a singly ionized interstitial. Zn diffusivity is proportional to the hole concentration and the background concentration can significantly reduce the Zn diffusion. Using the SIMS profiles from the literature and calibrating the simulator parameters, Zn diffusion was simulated in Sentaurus Process simulator. Sentaurus Process offers a number of different diffusion and impurity activation models that could be used for modeling of the Zn diffusion. We focused on the constant diffusion model and Fermi diffusion model. Contrary to constant diffusion model, Fermi diffusion model takes into account the dependency of diffusivities on the electron (hole) concentrations [21]: డ஼ಲ డ௧ ௡ ି௖ି௭ ௡೔ ∇ ଴ ஺௑ ೎ = ஺௑ ೎ exp − ௭ ା ௡ ஺ ௡ ಶ ஽ಲ೉ ೎ ௞் , 20 -3 Concentration (cm ) 6 4 10 19 Fermi c = 1 Fermi c = 2 10 SIMS 18 10 17 10 16 Process simulations: Fermi model (c = 2) Fermi model (c = 1) Constant diffusion 0.0 0 5 10 15 20 25 30 35 40 Multiplication gain Figure 5. Excess noise factor of the simulated InP/InGaAs APDs for temperatures of 200 K and 300 K MIPRO 2016/MEET Constant diffusion Sulfur Excess noise factor T = 300 K T = 200 K 0 SIMS from [18] Zn T = 200 K 2 (3) The results of the additional verification of the fitted Fermi diffusion model are plotted in Fig. 7. (a) and (b). In Fig. 7 (a) the diffusion is simulated with temperatures of 475° C and 500° C and the durations of 30 min and 15 min, respectively. Bulk doping concentration is 2·1016 cm-3. Excellent matching of the simulated profile to the SIMS 30 min @ T = 475° C T = 300 K (2) Process simulations of the Zn diffusion using the proposed models are compared to the SIMS data from [18] and depicted in Fig. 6. Sulfur background concentration of 2·1016 cm-3 in InP is used. Diffusion assumes constant surface concentration of 8·1018 cm-3. The diffusion time is 30 min at 475° C. Zn diffusion profile obtained by the constant diffusion models largely underestimates the shape of the real diffusion profile obtained from SIMS. Femi diffusion model with c = 2 has a sharper decrease of the Zn concentration near the pn-junction than the real SIMS profile. On the other hand, Fermi diffusion model with c = 1 can excellently fit the SIMS data. Values of the fitting ଴ ா -3 constants ஺ூ cm2/s and 1.75 eV, ೎ and ஺ூ ೎ are 10 respectively. 10 8 , ೔ where CA is the concentration of substitutional dopands, CA+ is the active portion of CA, c is the charge state of point defect, z is the charge state of dopant A, ni is the intrinsic concentration, n is the electron (hole) carrier concentration, X is either interstitial or vacancy, k is the Boltzmann ଴ ா constant, T is the temperature, ஺௑ ೎ and ஺௑ ೎ are Fermi diffusion constants that can be calibrated in the process simulator. Dopant activation is modeled by solid solubility model. Parameters of the solid solubility model are taken from [18]. 10 Excess noise factor = ∇ ∑௑,௖ ஺௑ ೎  0.5 1.0 1.5 2.0 Depth (µm) Figure 6. Comparison of the Zn diffusion profile obtained by constant diffusion model, Fermi model with c = 1 and c =2 with SIMS Zn profiles from [18] 37 Process simulations: 30min @ 475° C 15min @ 500° C Zn 10 19 10 18 10 17 -3 SIMS from [18]: 30min @ 475° C 15min @ 500° C 15 min @ 500°C 30 min @ 475°C Sulfur 0.5 1.0 1.5 2.0 2.5 10 17 10 16 10 15 10 14 10 T = 500° C Measurements from [18] Linear fit 8 7 6 5 4 3 2 1 Process simulations 0 0 20 40 60 1/2 Time Absorption region Buffer layer Bulk InP Diffused Zn Zn concentration: Constant concentration Diffused profile Constant Zn n-type doping Sulphur concentration 0 1 2 3 4 5 6 7 8 9 10 Depth (µm) Figure 8. Cross section of the structure with p+ region realized as a box-like constant concentration profile and diffused Zn profile (b) 9 Multip. 18 Charge region 3.0 Depth (µm) Junction depth (µm) 10 p InP 80 100 1/2 (s ) Figure 7. Verification of the fitted Fermi diffusion model parameters. (a) Comparison of simulated doping concentration profile with Zn SIMS profile from [18]. (b) Comparsion of the simulated junction depth with the extracted junction depth from [18]. measurement can be achieved for both diffusion parameters. PN-junction depth is extracted versus the square root of the diffusion time for diffusion temperature of 500° C and depicted in Fig. 7 (b). The simulated pnjunction depth is compared to the measurements [18]. Junction depth displays the same behavior versus t1/2 and is a linear function of t1/2. Simulation results show that calibrated Fermi diffusion model provides a realistic model capable of assessing the Zn diffusion profile. B. Device simulations of APDs with realistic p+ region doping profile The APD fabrication is simulated with the fitted parameters of the Fermi model for Zn and the obtained diffusion profiles are later used in the device simulations. Doping concentration profiles of structures where p+ region is realized as a box-like constant concentration profile and as a diffused Zn profile are depicted in Fig 8. Junction depth in both cases is 2 µm. Temperature used in the simulations of the Zn diffusion is 500° C. The simulated diffusion time is 11.2 min that is needed to achieve 2 µm junction depth. Constant surface concentration of 2 · 1019 cm-3 is assumed. Current-voltage characteristics of the structures with both p+ region defined as a constant concentration and diffused Zn region are depicted in Fig. 8. Breakdown voltage for structure with p+ region simulated with diffused Zn profile increases approximately by 1.5 V for both temperatures. Dark current at 90 % of VBR for device operated at 300 K is 1.26 · 10-13 A/µm2 and 1.27 · 10-13 A/µm2 for structures with box-like constant concentration p+ region and diffused Zn p+ region, respectively. On the other hand, dark current at 90 % of VBR at 200 K is 1.5 · 10-17 A/µm2 for structure with box-like constant concentration p+ region and 8.9 · 10-18 A/µm2 for structure with diffused Zn p+ region. The dark current of the structure with diffused Zn p+ region decreases by a factor of 1.7 compared to the structure with box-like constant concentration p+ region. The decrease of the dark current is a result of the reduced TAT. For structure with realistic p+ region, the maximum electric field decreases, resulting in a smaller TAT. Reduced dark current can increase the sensitivity of the APD. The change in the breakdown voltage and current-voltage characteristics shows the importance of using the realistic diffusion profiles in the simulations of the device especially at lower temperatures. This is expected to become even more important in 2D simulations of the APD devices. IV. CONCLUSION Comprehensive TCAD simulation environment capable of simulating both device and process characteristics for InP/InGaAs APDs is demonstrated. This simulation procedure can be used in analysis and optimization of complex InP/InGaAs APD structures both in 1D and 2D. The knowledge of the realistic diffusion 10 -10 10 -11 10 -12 10 -13 10 -14 10 -15 10 -16 10 -17 10 -18 10 -19 10 -20 10 -21 2 0.0 38 19 16 Current density (A/µm ) 10 10 Grading 10 20 + Doping concentration (cm ) 10 -3 Concentration (cm ) (a) 21 T = 300 K + p region: constant conc. T = 300 K T = 200 K + p region: diffused Zn T = 300 K T = 200 K T = 200 K 20 30 40 50 60 70 80 90 Reverse voltage (V) Figure 9. Comparison of the current-voltage characteristics for the structure with box-like constant concentration p+ region and structure with diffused Zn p+ region at temperatures of 200 K and 300 K MIPRO 2016/MEET parameters is an essential parameter for a successful design and analysis of the guard rings at the periphery of the APD. [8] Analysis of the linear characteristics of a InP/InGaAs APD is demonstrated. Fitted model parameters for the impact ionization model, complex refractive index for InGaAs, TAT model for InP and SRH models for all the materials in the layer stack are used in this analysis. Process simulations of the diffusion of Zn into InP are added to the TCAD simulation environment. Impact of the real diffusion profile on the linear characteristics is also analyzed. Real diffusion profile changes the breakdown voltage and impacts the TAT in InP region. Decrease of the dark current for a factor of 1.7 at 90 % of VBR at 200 K is obtained by using the realistic diffused Zn profile. The importance of using the realistic Zn diffusion profiles is expected to be more critical in 2D simulations of the InP/InGaAs APDs. [9] [10] [11] [12] [13] REFERENCES [1] [2] [3] [4] [5] [6] [7] J. C. Campbell, “Recent Advances in Telecommunications Avalanche Photodiodes”, Journal of Lightwave Technology, vol. 25, no. 1, pp. 109-121, Jan. 2007. N. Gisin, G. Ribordy, W. Tittel, H. S. Zbinden, “Quantum cryptography”, Reviews of Modern Physics, 74 (1), pp. 145-195, Jan 2002. U. Schreiber, C. Werner, “Laser Radar Ranging and Atmospheric Lidar Techniques”, Proceedings of SPIE, SPIE, 1997, Dec. 1997 S. Verghese et al., "Geiger-mode avalanche photodiodes for photon-counting communications," Digest of the LEOS Summer Topical Meetings, 2005, pp. 15-16. S. Cova, M. Ghioni, A. Lacaita, C. Samori, and F. Zappa, "Avalanche photodiodes and quenching circuits for single-photon detection," Applied Optics, Vol. 35, Issue 12, pp. 1956-1976, 1996. Y. Liu, S. R. Forrest, J. Hladky, M. J. Lange, G. H. Olsen and D. E. Ackley, “A planar InP/InGaAs avalanche photodiode with floating guard ring and double diffused junction” Journal of Lightwave Technology, vol. 10, no. 2, pp. 182-193, Feb 1992. S.R. Forrest, R.G. Smith, and O.K. Kim, “Performance of In0.53Ga0.47As/InP avalanche photodiodes,” IEEE J. Quantum Electron.,vol. QE-18, pp. 2040-2048, 1982. MIPRO 2016/MEET [14] [15] [16] [17] [18] [19] [20] [21] F. Acerbi, M. Anti, A. Tosi and F. Zappa, “Design Criteria for InGaAs/InP Single-Photon Avalanche Diode” IEEE Photonics Journal, vol. 5, no. 2, pp. 6800209-6800209, April 2013. C. L. F. Ma, M. J. Deen and L. E. Tarof, “Multiplication in separate absorption, grading, charge, and multiplication InP-InGaAs avalanche photodiodes”, IEEE Journal of Quantum Electronics, vol. 31, no. 11, pp. 2078-2089, Nov 1995. J. P. Donnelly et al., “Design Considerations for 1.06-µm InGaAsPInP Geiger-Mode Avalanche Photodiodes” IEEE Journal of Quantum Electronics, vol. 42, no. 8, pp. 797-809, Aug. 2006. X. Jiang, M. A. Itzler, R. Ben-Michael and K. Slomkowski, “InGaAsP–InP Avalanche Photodiodes for Single Photon Detection”, IEEE Journal of Selected Topics in Quantum Electronics, vol. 13, no. 4, pp. 895-905, July-aug. 2007. M. Anti, F. Acerbi, A. Tosi and F. Zappa, “2D simulation for the impact of edge effects on the performance of planar InGaAs/InP SPADs”, Proc. SPIE 8550, Optical Systems Design 2012, 855025 M. Anti, F. Acerbi, A. Tosi and F. Zappa, "Integrated simulator for single photon avalanche diodes," Numerical Simulation of Optoelectronic Devices (NUSOD), 2011 11th International Conference on, Rome, 2011, pp. 47-48. Sentaurus Device User Guide, Synopsys, Mountain View, CA, USA, Mar. 2016. T. Knežević, T. Suligoj, “Examination of the InP/InGaAs SinglePhoton Avalanche Diodes by Establishing a New TCAD-based Simulation Environment”, submitted.for publication S. Adachi, "Physical Properties of III-V Semiconductor Compounds: InP, InAs, GaAs, GaP, InGaAs, and InGaAsP“, John Wiley & Sons, 1992 Mclntyre R.J., “Multiplication Noise in Uniform Avalanche Diodes”, IEEE Trans. Electron Devices, ED13, pp. 164-168 (1966) G. J. van Gurp, P. R. Boudewijn, M. N. C. Kempeners and D. L. A. Tjaden, “Zinc diffusion in n-type indium phosphide” Journal of Applied Physics, Vol. 61, pp. 1846-1855, 1987. S. Y. Yang and J. B. Yoo, “Characteristics of Zn diffusion in planar and patterned InP substrate using Zn3P2 film and rapid thermal annealing process”, Surface and Coatings Technology, Vol. 131, Issues 1–3, Pages 66-69, 2000. H. S. Marek and H. B. Serreze, “Diffusion coefficients and activation energies for Zn diffusion into undoped and S‐doped InP”, Applied Physics Letters, vol. 51, pp. 2031-2033, 1987 Sentaurus Process User Guide, Synopsys, Mountain View, CA, USA, Mar. 2016. 39 Design of Passive-Quenching Active-Reset Circuit with Adjustable Hold-Off Time for Single-Photon Avalanche Diodes I. Berdalović*, Ž. Osrečki*, F. Šegmanović*, D. Grubišić**, T. Knežević* and T. Suligoj* * Faculty of Electrical Engineering and Computing, University of Zagreb, Croatia ** Laser Components DG, Inc., Tempe, Arizona, USA tomislav.suligoj@fer.hr Abstract - Single-photon avalanche diodes (SPADs) are gaining popularity in applications where low intensity light needs to be detected. Since they are used in Geiger mode, where the self-sustaining avalanche needs to be quenched, an important part of the detection circuitry is the quenching circuit. First, we examine the operation of a basic passive quenching circuit consisting of the SPAD and two series resistors and measure the SPAD’s dark count rate. Then we implement a passive quenching circuit with active reset (PQAR). Without a sufficiently long hold-off time between quenching and reset the circuit does not operate properly. Because of that, a hold-off time is introduced by means of an adjustable time delay circuit. The behavior of the PQAR circuit for different hold-off times is then examined, and the minimum hold-off time of 1 μs, which still allows for correct operation with the given circuitry is determined. Finally, a comparison is made between the passive and the PQAR circuit, focusing on the advantages of active over passive reset. I. INTRODUCTION A single-photon avalanche diode (SPAD) is a solidstate photodetector used for detecting low-intensity optical signals [1]. Low cost, miniature size, higher quantum efficiency and low voltage operation made the SPAD a replacement for photomultiplier tubes (PMT) in many applications today. The SPAD is basically a p-n junction reverse biased above the breakdown voltage (as opposed to avalanche photodiodes operated below the breakdown voltage) and thus operated in Geiger-mode, where each electron-hole pair can trigger an avalanche multiplication process [2]. The avalanche current rises swiftly until quenched by an external circuit. The leading edge of the current pulse gives information about photon arrival time. However, the avalanche multiplication process can also be triggered by thermally generated electron-hole pairs inside the active region or by the charge released from deep-level traps. These mechanisms give rise to dark count rate (DCR) and represent the noise of a SPAD. Charge released from deep-level traps can cause afterpulsing [3]. The avalanche, once triggered, is self-sustained until quenched by a quenching circuit, during an interval called the quenching time. The quenching circuit also has to This work was supported by the Croatian Science Foundation under contract no. 9006. 40 detect the avalanche, produce a readable output signal and prepare the diode for new detections, all that during a time interval called dead time [4, 5]. The dead time limits the SPAD’s maximum operating frequency, because the diode is not able to detect photons in that period. There are two basic types of quenching circuits: passive and active [4]. Passive quenching circuits shut down the avalanche process by reducing the diode’s voltage below the breakdown voltage by means of a ballast resistor connected in series with the diode. The avalanche current creates a voltage drop on the resistor, thereby reducing the voltage on the diode below the breakdown voltage [6]. Active quenching circuits use external circuitry to shut down the avalanche, and an improved performance can be achieved, with circuit complexity as a disadvantage. After quenching is complete, the diode voltage must be restored to its operating value. This can be done passively (long reset time, simple circuit) and actively (small reset time, complex circuit). The type of quenching circuit determines the type of SPAD operation: free-running or gated. In free-running mode, the SPAD is constantly biased above the breakdown voltage, as opposed to gated mode operation, where the bias voltage is periodically lowered. In gated mode, the diode is only active when a photon needs to be detected, therefore, the frequency of incoming photons must be known before detection. In free-running mode, the diode can detect randomly incoming photons and is only limited by the dead-time needed to quench the avalanche from a previous detection and restore the operating voltage. The diode voltage is restored to its operating value in a period called the reset time. During the reset time interval, the diode is still biased above the breakdown voltage and the avalanche can be triggered, so there must be a certain period after quenching and before reset where leftover charge is removed from the diode. This time interval is called hold-off time [4]. All residual charge can cause afterpulsing during and after reset, thereby reducing the possibility of detecting the next photon. As a trade-off between the circuit complexity and performance, a passive quenching active reset (PQAR) circuit passively quenches the avalanche and actively restores the SPAD’s operating voltage, shortening the reset time [7]. There are different techniques for afterpulsing reduction in PQAR circuits, some of them MIPRO 2016/MEET being variable-load [8] and gated mode PQAR quenching circuits [9]. Monolithic quenching circuit design results in lower parasitic capacitances and higher performance [10]. In this paper, a PQAR circuit with adjustable hold-off time is designed. First, with a simple passive quenching circuit, the diode’s voltage and avalanche current waveforms are measured. An active reset mechanism is then implemented using a switching transistor in parallel to the ballast resistor. However, if the active reset starts during passive quenching, the avalanche is not properly quenched and the circuit does not function as wanted. The introduction of a hold-off time between quenching and reset results in a properly functioning quenching circuit. As a means of hold-off time optimization, a quenching circuit with adjustable hold-off time is designed with offthe-shelf components. With negligible complexity, the circuit can be used to determine minimum hold-off time for a certain SPAD with respect to afterpulsing probability and counting frequency. Dark counts are used as a trigger for avalanche multiplication in all measurements. II. QUENCHING CIRCUITS A. Passive Quenching Circuit The passive quenching circuit with passive reset is the basic circuit to shut down the avalanche of the photodiode. As shown in Fig. 1, it consists of a reverse biased SPAD in series with a large ballast resistor RA=560 kΩ and a small resistor RC=1 kΩ. The SPAD used in this and all subsequent circuits is a SAP500 from Laser Components DG [11]. It is operating at a voltage VOP=141.1 V, which is higher than the breakdown voltage VBR=139.1 V by the overvoltage VEX=VOP–VBR=2 V. When an avalanche is triggered, the current through the diode increases and the voltage drop across the resistor RA increases, causing the diode voltage to decrease below the breakdown voltage. As that happens, the avalanche is quenched and the diode current becomes negligible. The voltage drop across the resistor RC, shown in Fig. 2 a), is in fact caused by the diode current, so the cathode voltage waveform is the same as the diode current waveform, shown enlarged in Fig. 3 (curve a). We can see that the quenching time, i.e. the time until the avalanche current drops, is approximately 50 ns. Since RA is loaded with a significant capacitance, namely the oscilloscope input capacitance and other Figure 1. Schematic of the passive quenching circuit. MIPRO 2016/MEET (a) (b) Figure 2. Transient response of passive quenched SPAD: (a) AC coupled cathode voltage and (b) Anode voltage. parasitic capacitances, the currents through RC and RA are different. Most of the diode current is actually capacitance-charging current, and only a fraction of the diode current flows through RA, producing a voltage drop of around 5 V only, as seen in Fig. 2 b). After the avalanche is quenched, the anode capacitance has to discharge through the ballast resistor RA and the voltage decreases to zero with an RC time constant determined by the anode capacitance and RA. As shown in Fig. 2. b), this time constant is rather large so the reset time of the circuit is around 30 μs. Thus far, we have described the case with an oscilloscope probe connected to the anode. The anode capacitance also determines the peak avalanche current. As shown in Fig. 3 (curve b), the peak current is only about 0.3 mA without the oscilloscope probe connected to the anode, since less current is needed to charge a smaller CA. B. PQAR Circuit without Hold-Off To improve the passive reset, we wanted to implement a passive quenching circuit with active reset. The first idea was to use a constant level discriminator (CLD) to detect the cathode voltage drop. The output of the CLD is connected to the gate of a MOSFET, and when it goes high, it switches the MOSFET on, effectively connecting the anode to the ground and immediately resetting the circuit. The schematic of this circuit is shown in Fig. 4. Figure 3. Transient response of passive quenched SPAD: Diode current a) with and b) without oscilloscope probe connected to the anode. 41 (a) (b) Figure 4. Schematic of the PQAR circuit without hold-off. The threshold voltage of the CLD is determined using two resistors, R2=1 kΩ and R3=4.7 kΩ, connected as a voltage divider from the negative supply to the positive input. This gives us a threshold of about VTR=–0.85 V. The negative input of the CLD is connected to the AC coupled cathode. Thus, when the cathode voltage drops by more than 0.85 V, the CLD is triggered. The output capacitance of the MOSFET (BS107) is 30 pF, which is larger than the oscilloscope capacitance, so this is the dominant capacitance in CA. As a result, the peak avalanche current is now about 2 mA, producing a cathode voltage drop of nearly 2 V, which is more than enough to trigger the CLD. Fig. 5 shows the transient response of this circuit. Figure 6. 42 Figure 5. Transient response of PQAR circuit without hold-off: (a) AC coupled cathode, (b) Anode (c) Output of the CLD (node 1). It can be clearly seen that the anode voltage is indeed reset when the MOSFET is switched on. However, after the MOSFET is switched off, the anode voltage rises again and this time, resets slowly, as in the case of passive reset. This is caused by the fact that reset occurs too soon, before the avalanche is properly quenched. Because of that, a hold-off time is needed between quenching and reset. Schematic of the PQAR circuit with adjustable hold-off time, including the pulse generator circuit with the adjustable time delay. MIPRO 2016/MEET C. PQAR Circuit with Adjustable Hold-Off Time We have concluded in Section II. B that the PQAR circuit without hold-off between quenching and reset does not operate as wanted. Therefore, we have implemented a pulse generator circuit with adjustable time delay between the output of the CLD and the gate of the MOSFET. The purpose of this circuit is to create a voltage pulse at the gate with adjustable duration after the output of the CLD goes high. The schematic of the whole quenching circuit including the pulse generator is shown in Fig. 6. As in the previous circuit, the cathode voltage drop caused by an avalanche event triggers the CLD (NE521), and its output goes high while the AC coupled cathode voltage is below its threshold voltage. This in turn triggers the J-K flip-flop (74LS73), its Q output goes high and charges the capacitor C2 with an RC time constant determined by R5 and C2. When the capacitor voltage reaches the threshold voltage of the inverting Schmitt triggers (74HC14), the JK flip-flop is reset asynchronously. The voltage at node 3 goes high and triggers the pulse shaper (74HC74), which subsequently creates a 12 ns pulse at the gate of the MOSFET, switching it on. The RC network consisting of R6 and C3 suppresses the ringing at node 3 and ensures the proper operation of the pulse shaper. The pulse shaper is needed to generate a short pulse at the gate, since node 3 stays high considerably longer, until the voltage of the discharging capacitor C2 falls below the Schmitt trigger threshold. For a higher count rate, it is desirable that the MOSFET is switched on for as short a period of time as possible, in our case about 12 ns. By changing the resistance R5 we can charge C2 with different time constants, which gives us a simple way of adjusting the delay of the gate pulse, i.e. the hold-off time. The voltage waveforms at certain nodes of the described circuit for a hold-off time of approximately 5 μs are shown in Fig. 7. Fig. 8 shows the anode and cathode voltages of the same circuit, but using the maximum value of resistor R5, which is 10 kΩ. This value gives us a maximum hold-off time of approximately 10 μs. Obviously, the longer the hold-off time, the greater the possibility that another avalanche event may occur during that time, i.e. before reset. If that is indeed the case, the new avalanche will occur while the diode is still below its operating voltage, which means that the avalanche current will be significantly smaller. That in turn means that the voltage drop on the cathode will be smaller as well, and this voltage drop may not be sufficient to trigger the constant level discriminator. Therefore, if an avalanche occurs during hold-off, it will not be detected by the circuitry. An example of that can be seen in Fig. 8. The avalanche current caused by the second dark count results in a voltage drop of around 0.5 V across the resistor RC, which is not enough to trigger the CLD. As a result, that count is lost. The diode resets 10 μs after the first dark count. In order to achieve a high counting frequency, the hold-off time must be as short as possible. Because of the intrinsic time delays of the components used, the minimum hold-off time possible with our circuit was determined to be around 300 ns. However, for such short hold-off times, the afterpulsing becomes a possible issue. Afterpulsing occurs when carriers trapped from an MIPRO 2016/MEET (a) (b) (c) (d) (e) (f) Figure 7. Transient response of PQAR circuit with a hold-off time adjusted to 5 μs in this case: (a) SPAD cathode (AC coupled), (b) SPAD anode, (c) Output of the CLD (node 1), (d) Capacitor C2 (node 2), (e) Output of the Schmitt triggers (node 3), (f) MOSFET gate (node 4). avalanche are subsequently released, triggering new unwanted avalanches [12]. That could explain the 43 (a) (a) (b) (b) Figure 8. An example of an avalanche pulse occured during hold-off for the PQAR circuit with a hold-off time adjusted to 10 μs: (a) AC coupled cathode voltage and (b) Anode voltage. waveforms obtained with a hold-off time of around 500 ns, shown in Fig. 9. A new unexpected avalanche occurs immediately after reset in some cases, which could be an afterpulse. This effect is much more pronounced for shorter hold-off times. Thus, for a hold-off time of 500 ns, we have obtained a cascade of multiple avalanches, as shown in Fig. 9. The hold-off time could be further reduced by lowering the overvoltage VEX and thus decreasing the afterpulsing probability. However, a lower VEX results in a lower detection efficiency, so there is a trade-off between shortening the hold-off time and keeping the detection efficiency high. For all the reasons above, we choose a hold-off time of 1 μs as the optimum hold-off time for the used SPAD. This time is long enough for the afterpulsing probability to be negligible, but not too long, to minimize the number of dark counts during hold-off and to increase the maximum (c) (d) Figure 10. Transient response of the PQAR circuit with the optimum hold-off time of 1 μs: (a) SPAD cathode (AC coupled), (b) SPAD anode, (c) Output of the CLD (node 1), d) MOSFET gate (node 4). frequency of detected photons. Obviously, for different diodes, this optimum value will vary, and our circuit provides a simple solution for adjusting the desired holdoff time. The voltage waveforms for this final version of the circuit are shown in Fig. 10. (a) (b) Figure 9. Single shot capture of waveforms of the PQAR circuit with a hold-off time of 500 ns with a cascade of afterpulses: (a) AC coupled cathode voltage and (b) Anode voltage. This event occurs in about 1 in 20 measurements. 44 If we compare this PQAR circuit with a hold-off time of 1 μs to the passive quenching circuit using the same resistors in series to the SPAD, we can observe the main advantage of the active over passive reset. While the passive reset time is measured to be around 30 μs, the SPAD with the PQAR circuit is fully reset to its operating voltage in just over 1 μs. That means that the maximum counting frequency of the PQAR circuit is almost 30 times higher than that of the passive circuit, which is a significant improvement. The problems of active reset, such as larger afterpulsing probability, possible detections during hold-off or while the MOSFET is on, are overcome by choosing a suitable hold-off time and by keeping the MOSFET on for very short periods of time, which is achieved by applying short pulses to the gate of the MOSFET. MIPRO 2016/MEET III. CONCLUSION The aim of this paper was to analyze the impact of hold-off time on the performance of a PQAR circuit for SPADs. The behavior of the SPAD was examined during the operation with a simple passive quenching circuit. A need for a hold-off time is demonstrated on a simple PQAR circuit where reset starts during quenching. Then, a PQAR circuit with adjustable hold-off time is designed as a means of performance optimization for different SPADs. Finally, the impact of different hold-off times on the performance of the quenching circuit is described, showing that a shorter hold-off time provides a higher counting frequency, but that the hold-off time has to be long enough to prevent false detections. Also, the advantages of active over passive reset, in particular the shorter reset time, are demonstrated. REFERENCES [1] [2] [3] H. Dautet et al., “Photon-counting techniques with silicon avalanche photodiodes,” Applied Optics, vol. 35, pp. 3894-3900, 1993. J. Zhang, M. A. Itzler, H. Zbinden, J. W. Pan, “Advances in InGaAs/InP single-photon detector systems for quantum communication,” Light: Science & Applications 4, e286, 2015. M. Stipčević, D. Q. Wang, R. Ursin, “Characterization of a commercially available large area, high detection efficiency single-photon avalanche diode,” IEEE Journal of Lightwave Technology, vol. 31, no. 23, pp. 3591-3596, 2013. MIPRO 2016/MEET [4] A. Gallivanoni, I. Rech, M. Ghioni, “Progress in quenching circuits for single photon avalanche diodes,” IEEE Transactions on Nuclear Science, vol. 57, no. 6, pp. 3815-3826, 2010. [5] M. Stipčević, H. Skenderović, D. Gracin, “Characterization of a novel avalanche photodiode for single photon detection in VISNIR range,” Optics Express, vol. 18, issue 16, 2010. [6] B. F. Aull et al., “Geiger-Mode avalanche photodiodes for threedimensional imaging,” Lincoln Laboratory Journal, vol. 13, no. 2, pp. 335-350, 2002. [7] S. Cova, M. Ghioni, A. Lacaita, C. Samori, F. Zappa, “Avalanche photodiodes and quenching circuits for single-photon detection,” Applied Optics, vol. 35, no. 12, pp. 1956-1976, 1996. [8] S. Tisa, F. Guerrieri, F. Zappa, “Variable-load quenching circuit for single-photon avalanche diodes,” Optics Express, vol. 16, no. 3, pp. 2232–2244, 2008. [9] M. Liu, C. Hu, J. C. Campbell, Z. Pan, M. M. Tashima, “A novel quenching circuit to reduce afterpulsing of single photon avalanche diodes,” Proc. SPIE, vol. 6900, no. 5, 2008. [10] D. Bronzi et al., “Fast sensing and quenching of CMOS SPADs for minimal afterpulsing effects,” IEEE Photonics Technology Letters, vol. 25, no. 8, pp. 776-779, 2013. [11] Laser Components DG, Inc. Pulsed Laser Diodes - Avalanche Photodiodes Catalog, available at: http://www.lasercomponents.com/fileadmin/user_upload/home/Da tasheets/lc/kataloge/pld-apd.pdf [12] M. G. Liu, C. Hu, J. C. Campbell, Z. Pan, M. M. Tashima, “Reduce afterpulsing of single photon avalanche diodes using passive quenching with active reset,” IEEE Journal of Quantum Electronics, vol. 44, no. 5, pp. 430-434, 2008. 45 Impact of the Emitter Polysilicon Thickness on the Performance of High-Linearity Mixers with Horizontal Current Bipolar Transistors J. Žilak*, M. Koričić, H. Mochizuki**, S. Morita** and T. Suligoj* * University of Zagreb, Faculty of Electrical Engineering and Computing, Department of Electronics, Microelectronics, Computing and Intelligent Systems, Micro and Nano Electronics Laboratory, Zagreb, Croatia ** Asahi Kasei Microdevices Co. 5-4960. Nobeoka, Miyazaki, 882-0031, Japan jzilak@zemris.fer.hr Abstract - The impact of the emitter polysilicon etching in Tetramethyl Ammonium Hydroxide (TMAH) on the characteristics of high-linearity mixers fabricated with the low-cost Horizontal Current Bipolar Transistor (HCBT) is analyzed. During emitter formation, the thick layer of α-Si is deposited over the whole wafer, which is then etched-back in the TMAH. The emitter thickness depends on the TMAH etching time and impacts the HCBT's electrical characteristics. Active down-converting mixers with opencollector topology based on Gilbert cell are fabricated with two types of HCBTs with different TMAH etching time using the lowest-cost HCBT technology with CMOS n-well region for n-collector. Measurements of mixers' characteristics are done on-wafer by using the multi-contact probes. The mixers achieve maximum IIP3 of 20.2 dBm and conversion gain of 4 dB. Differences in performance characteristics between two mixer types are small indicating that HCBT's circuit performance sensitivity on the emitter thickness variations is relatively small. I. INTRODUCTION The improvement of the high-frequency response of the CMOS devices, enabled by the downscaling, led to the widening of the CMOS technology usage in wireless and other radio frequency (RF) analog circuit applications. In order to keep its low-cost and high-volume production, scaling techniques and further CMOS development require increasing investments [1]. On the other hand, bipolar technologies are suitable for the mentioned RF applications at coarser technology nodes due to better high-frequency characteristics, noise factor and higher gain [2], [3]. Hence, the solution to the very cost-sensitive demands of the RF integrated circuits (RFICs) market is the addition of bipolar devices to the coarser lithography CMOS technology. It is critical that such integration is done with minimum number of new masks and process steps, keeping the fabrication costs as low as possible. The Horizontal Current Bipolar Transistor (HCBT) technology, developed with a novel technological approach as add-on to 180 nm CMOS process, is an example of such low-cost integration. The integration is accomplished with addition of only 2 or 3 lithography masks and few process steps. The fabricated HCBTs with This work was supported by the Croatian Science Foundation under contract no. 9006. 46 very high peak cutoff frequency (fT) of 51 GHz and maximum frequency of oscillations (fmax) of 61 GHz, along with collector-emitter breakdown voltage (BVCEO) of 3.4 V has been demonstrated [4]. Moreover, the highvoltage transistors are integrated in the process at zero cost [5]. The RF mixers are widely used in mobile and base station transceivers, as well as general purpose RF systems. A spectral efficiency and an intermodulation distortion, generated due the nonlinear nature of active elements, are important effects in modern systems imposing high demands on the RF mixers [6]. Hence, high-linearity mixers with low power consumption are preferable in the integrated mixer design. On the other hand, high performance mixers can be fabricated with a minimum number of large area passive components placed on the chip. Therefore, mixer design has shown to be suitable for RFIC design in novel HCBT technology and high-linearity active mixers with input 3rd order intercept point (IIP3), IIP3 = 23.8 dBm, have recently been demonstrated [7]. In this paper, the impact of the emitter polysilicon etching in Tetramethyl Ammonium Hydroxide (TMAH) on the characteristics of high-linearity mixers fabricated Figure 1. Cross-section of the HCBT: a) α-Si deposition prior to TMAH etching, b) final structure, c) TEM of the intrinsic region with the short-TMAH, d) TEM of the intrinsic region with long-TMAH MIPRO 2016/MEET Collector, Base Current (A) 10 -2 10 -3 10 -4 10 -5 10 -6 10 -7 10 -8 10 -9 10 -10 10 -11 10 -12 AE = 0.1 x 1.8 µm VCE = 2 V 2 IC IB HCBT - short-TMAH HCBT - long-TMAH 0,4 0,6 0,8 1,0 1,2 Base-Emitter Voltage (V) Figure 2. Measured Gummel plots of the HCBTs with the short- and long-TMAH etching times with an emitter area of 0.1×1.8 µm2 at VCE = 2 V. 60 AE = 0.1 x 1.8 µm VCE = 2 V Frequency (GHz) 50 2 40 Electrical characteristics of unit transistors, which differ only in TMAH etching times, are measured. The Gummel characteristics of both HCBT types are shown in Fig. 2. The HCBT with long-TMAH has smaller base current (IB) at base-emitter voltage VBE = 0.9 V, which is the bias point around peak fT. Consequently, long-TMAH HCBT has higher peak current gain βlong-TMAH=133 in comparison to the βshort-TMAH=117. It is due to the longer distance between intrinsic emitter and the extrinsic base (p+ base) and consequently reduced electron injection into the base contact. Moreover, the charge sharing between intrinsic and extrinsic base acceptors is reduced if the intrinsic transistor is at larger distance from the extrinsic base, as explained in [8]. The reduced charge sharing effect is also beneficial for the fT and fmax (Fig. 3) resulting in a higher fT in the case of the long-TMAH HCBT (thinner emitter polysilicon). The fmax benefits from the increased fT, but its increase is offset by an increased base resistance in case of thinner emitter at the long-TMAH HCBTs [8]. III. 30 20 fT 10 fmax HCBT - short-TMAH HCBT - long-TMAH 0 -5 10 10 -4 10 -3 Collector Current (A) Figure 3. Cutoff frequency (fT) and maximum frequency of oscilations (fmax) versus collector current of the HCBTs with the short- and longTMAH etching times, with an emitter area of 0.1×1.8 µm2 at VCE = 2 V. with the lowest-cost HCBTs using CMOS n-well region as n-collector with only 2 additional masks is analyzed. II. FABRICATED HCBT STRUCTURES The HCBT fabrication sequence and its integration with the CMOS baseline process using a base-after-gate scheme are described in detail in [8]. All examined mixers are designed and fabricated by using the CMOS n-well collector HCBT with a single polysilicon region. It is a lowest-cost version of HCBT since it uses only 2 additional lithography masks and skips the additional n-collector implantations. A cross-section of the HCBT structure is shown in Fig. 1b. The emitter formation as a subject of this study is described in more detail as follows. The in situ phosphorus doped amorphous silicon (α-Si) (Fig. 1a) is deposited and etched-back by using Tetramethyl Ammonium Hydroxide. The deposited polysilicon layer has to be thick enough to achieve planar surface of the deposited film. A thin native oxide layer, grown during the pre-deposition annealing, serves as a protection layer keeping the n-hill from the TMAH etching. At the same time, the oxide is thin enough allowing the current flow in or out of the n-hill. The TMAH etching is time-controlled and the final polysilicon thickness is determined by the etching duration. Two TEM micrographs of fabricated HCBT structures with different TMAH etching times are shown in Fig. 1. The shorter time (short-TMAH) results in a thicker polysilicon layer (Fig. 1c) and the longer etching time (long-TMAH) results in a thinner polysilicon layer (Fig. 1d). MIPRO 2016/MEET HCBT MIXER DESIGN High-linearity mixers are designed and fabricated using both unit HCBTs with short- and long-TMAH etching. A down-converting active mixer, using a doublebalanced Gilbert cell topology [9] with an open-collector output, is designed and its schematic is shown in Fig. 4. Main circuit parts are a differential pair (Q1 and Q2) as a transconductance stage that transforms the input voltage signal to the current, and a switching quad consisting of 4 transistors (Q3~Q6), that commutates the current providing frequency conversion. The differential Local Oscillator (LO) signal, is generated by the LO buffer circuit. It should be noted that 128 of unit HCBT transistors are connected in parallel in the design for each Q1~Q6 transistor in the scheme. The current mirror (Q7 and Q8) sets the bias current. Degeneration emitter resistors (RE) are used in order to improve linearity, since it is a critical parameter of mixers in wireless transceivers. The input impedances are designed to be 50 Ω. The open-collector design provides flexibility in setting parameters such as output impedance, conversion gain and linearity. Moreover, it can be used effectively Figure 4. Double-balanced active mixer based on a Gilbert cell with an open collector, which is implemented on-wafer, except the output network that is added on external PCB, as marked. 47 20 20 15 15 IIP3 10 5 5 0 0 -15 Figure 5. Measurement setup used for the on-wafer measurements with mixer chip photo and photo of additional PCB with both differential and single-ended filter configurations that are usually connected to the mixer outputs. Mixer performance is, in this case, highly dependent on its output network. It has to reduce the high output impedance of the switching quad transistors and to provide an interface to the load, which assures adequate power transfer with a minimum number of external components [10]. The voltage swing at the output of the LO buffer circuit has to be large enough to switch on and off the switching quad transistors completely (Fig. 4). The buffer is amplifying the input LO signal and provides the singleended to differential conversion. That is accomplished by the differential amplifier, which is the main part of the buffer circuit. Moreover, the buffer improves the isolation between the LO and other mixer's ports (RF and output IF) and sets the proper DC voltage level for the switching quad transistors (Q3~Q6 in Fig. 4). The output stage of the buffer is build by the emitter followers, which needs to assure sufficient current for driving the LO switching quad capacitances. The conversion gain (CG) and the IIP3 are the most common figure-of-merits in mixer performance characterization. They are determined by the transconductance of the differential pair (gm) and the value of the emitter degeneration resistance RE, as well as by the shape and the magnitude of the LO signal generated by the LO buffer. The CG can be approximated by [11] CG ≈ RL 2 ⋅ , π 1/ gm + RE (1) where RL is the load resistance. The higher gm and RL result in the higher CG, while the higher RE reduces it. On the other hand, both high gm and RE are beneficial for IIP3 as a linearity figure of merit, as approximated in [11] IIP3 ≈ 4 2vT ⋅ (1 + g m RE ) 3/ 2 , (2) where the vT is the thermal voltage. Hence, in order to achieve CG above 0 dB, the mixer current is increased to have a higher gm by putting the unit HCBTs in parallel. The chosen value of RE is 20 Ω and it assures highlinearity behavior. 10 CG HCBT - short-TMAH HCBT - long-TMAH -10 -5 0 LO level (dBm) 5 Conversion Gain (dB) IIP3 (dBm) 25 -5 10 Figure 6. Measured input 3rd order intercept point (IIP3) and conversion gain (CG) vs. LO power level (PLO) of the mixers with RE = 20 Ω fabricated with the short- and long-TMAH HCBTs at the mixer current of 50 mA. Measurement setup of the two-tone test: PRF = -10 dBm, fRF = 900 MHz, fIF = 20 MHz, ∆f = 2 MHz, VCC = 5 V. IV. HCBT MIXER MEASUREMENTS The fabricated mixers are measured on-wafer by using the multi-contact probes. The measurement setup is shown in Fig. 5. The used probes have in total 4 ground-signalground RF ports connected to the signal pads and several DC probes for the power supply, reference voltages and current connections. Both input ports (RF and LO) are fully differential but they are driven by the single-ended signal with another input grounded. Two RF signal generators are used for the RF port driving, since a twotone test is required for linearity measurements. The signals are added together by using a power combiner. The mixer current is adjustable via a current source (ISET) connected to the current mirror input. Additional mixer's DC operating point tuning for the optimal performance can be achieved by DC voltages connected to the DC probes. The open-collector IF outputs are connected to a balun transformer. It is an off-chip component mounted on a small PCB designed for the measurement purpose. The balun is an 8:1 impedance transformer used to convert differential IF outputs to single-ended. When loaded with 50 Ω, the balun present a 400 Ω load to the mixer. The open-collector outputs are biased through the two inductors also placed on the external PCB with the power supply voltage VCC = 5 V. They are required since the maximum DC current of the balun is specified not to exceed the 30 mA and the total mixer current is larger. For the smaller mixer current consumption, the bias could be set through the center pin of the balun's primary winding. The output power is measured by a spectrum analyzer connected to the secondary winding of the balun. The CG and IIP3 of the examined mixers are measured at 900 MHz RF (fRF) frequency and -10 dBm input power (PRF). The output frequency (fIF) is 20 MHz and the LO frequency (fLO) is set accordingly. The frequency spacing (∆f) used for the two-tone testing is 2 MHz. The used power drive of the LO buffer (PLO) is 0 dBm. The power combiner loss, and losses due to cables used for the RF and the LO port driving are the only losses included in the results deembedding. The LO buffer circuit function is verified with the measurement of the IIP3 and CG dependence on the LO 48 MIPRO 2016/MEET 0 POUT, PIM3 (dBm) Return loss (dB) 5 10 15 RF return loss LO return loss 20 25 30 0 100 200 300 400 500 600 700 800 900 1000 Frequency (MHz) power level shown in Fig. 6. Results are shown for the mixers with RE = 20 Ω fabricated with both short- and long-TMAH HCBTs. The IIP3 and CG values are constant around PLO = 0 dB, which is the standard LO drive in the commercial mixers, while small decrease of CG is observed for the LO power levels smaller than -5 dBm. Since the input impedances are designed to be 50 Ω, there is no need for the input matching networks at both RF and LO ports. The return loss measurements, necessary for the input impedances testing, are done by using the vector network analyzer. The measured return loss is less than 10 dB in both mixer types and results of the mixer with RE = 20 Ω fabricated with the long-TMAH HCBTs is shown in Fig. 7. Hence, more than 90 % of the incident power coming from the RF signal generators is absorbed by the mixer ports in frequency range 10 MHz to 1 GHz. The measured IIP3 and CG dependence on the mixer current (IMIX) (without the LO buffer and bias circuitry current consumption) of mixers fabricated with both shortand long-TMAH HCBT types are shown in Fig. 8. The total power consumption is 425 mW at a mixer current of 40 mA, including the LO buffer and bias circuitry consumption. The highest IIP3 value of 20.2 dBm is obtained in the short-TMAH HCBT mixer at the mixer current of 45 mA. The mixer with the long-TMAH HCBTs has the IIP3 of 19.6 dBm at the same mixer 25 20 IIP3 (dBm) 15 15 IIP3 CG HCBT - short-TMAH HCBT - long-TMAH 10 10 5 5 0 Conversion Gain (dB) Mixers with RE = 20 Ω 20 0 10 20 30 40 50 60 70 Mixer Current (mA) 80 90 Figure 8. Measured input 3rd order intercept point (IIP3) and conversion gain (CG) vs. mixer current (IMIX) of the mixers with RE = 20 Ω fabricated with the short- and long-TMAH HCBTs. Measurement setup of the two-tone test: PRF = -10 dBm, PLO = 0 dBm, fRF = 900 MHz, fIF = 20 MHz, ∆f = 2 MHz, VCC = 5 V. MIPRO 2016/MEET POUT PIM3 HCBT - short-TMAH HCBT - long-TMAH -10 0 10 20 30 PIN (dBm) Figure 7. Measured return loss at RF and LO ports of the mixer with RE = 20 Ω fabricated with the long-TMAH HCBTs at the mixer current of 50 mA. 25 30 20 10 0 -10 -20 -30 -40 -50 -60 -70 -80 -90 -20 Figure 9. Measured output power (POUT) and 3rd order intermodulation distortion power (PIM3) vs. input power (PIN) of the mixers with RE =20 Ω fabricated with the short- and long-TMAH HCBTs at the mixer current of 50 mA. Measurement setup: fRF = 900 MHz, fIF = 20 MHz, PLO = 0 dBm, VCC = 5 V. current. The shape of the linearity dependence curve on mixer current is similar in both HCBT mixers. On the other hand, CG is higher by around 0.4 dB in the case of mixer with the long-TMAH HCBTs and has a value of 4 dB at the mixer current of 55 mA. The measured IIP3 values of both mixer types are smaller in comparison to the previously reported high-linearity mixers [7], where IIP3 = 23.8 dBm, since those are fabricated with the optimized n-hill collector HCBTs. In this work, the mixers are designed with the CMOS n-well region as n-collector HCBTs that are fabricated with only 2 additional lithography masks, unlike 3 masks used for optimized n-hill collector HCBTs. The output and 3rd order intermodulation distortion power versus input power of both mixer types are shown in Fig. 9. The IIP3 values can be observed as theoretical intercept point of fundamental tone (POUT) and 3rd order intermodulation tone (PIM3), while the CG values correspond to the difference in POUT and PIN. The IIP3 and CG start to degrade for PIN above 0 dBm where the large signal effects occur. They are characterized by the 1 dB compression point (P1dB) which is defined as the power of input signal where the power gain is degraded by 1 dB. The measured P1dB values are 2 dBm and 2.7 dBm in both mixers with the short- and long-TMAH HCBTs, respectively. The similar values of IIP3 and CG in both short- and long-TMAH mixers (Figs. 8 and 9) suggest that the TMAH etching time impact on the mixer performance is relatively small. The emitter degeneration resistor (RE = 20 Ω), necessary for the high-linearity performance, has a great impact on the mixer characteristics, according to (1) and (2). Hence, the differences in HCBT's TMAH etching times and their impact are masked by the RE. In order to gain a deeper insight into the effect of TMAH etching time, the mixers without emitter degeneration resistors are also fabricated and the IIP3 and CG dependence on the mixer current (IMIX) are measured. They have identical LO buffers as the mixers with RE. The results of both short- and long-TMAH HCBT mixers are shown in Fig. 10. The mixers without degeneration have larger CG values, but they are less linear, as expected, in comparison to the ones with degeneration. Since there is 49 20 25 IIP3 (dBm) 15 20 IIP3 10 CG 15 HCBT - short-TMAH HCBT - long-TMAH 5 10 0 5 -5 Conversion Gain (dB) Mixers without RE 0 10 20 30 40 50 60 70 Mixer Current (mA) 80 no RE, the measured mixer characteristics are more dependent on the HCBT transistor characteristics. The long-TMAH HCBTs have smaller IB at the same VBE in comparison to the short-TMAH HCBTs. It results in a better current mirroring meaning that for the same reference current (ISET), total mixer current (IMIX) is higher in the case of the long-TMAH mixer. For ISET of 5 mA, the short-TMAH mixers have IMIX around 44 mA and the long-TMAH mixers have IMIX around 58 mA. Considering IIP3 and CG, the differences in the measured values between two mixer types are higher in comparison to the mixers with RE = 20 Ω, but they are still relatively small. The difference in CG is ∆CG=1.1 dB and in IIP3 is ∆IIP3 = 1.0 dB at the mixer current of 50 mA. V. REFERENCES 90 Figure 10. Measured input 3rd order intercept point (IIP3) and conversion gain (CG) vs. mixer current (IMIX) of the mixers without RE fabricated with the short- and long-TMAH HCBTs. Measurement setup of the two-tone test: PRF = -10 dBm, PLO = 0 dBm, fRF = 900 MHz, fIF = 20 MHz, ∆f = 2 MHz, VCC = 5 V. CONLUSION The impact of the TMAH etching time, as an important HCBT technology fabrication step, on the performance of the high-linearity mixers is investigated. The TMAH time variation results in the different polysilicon emitter thicknesses. The HCBT with thinner emitter polysilicon, as a result of the long-TMAH etching time, has a smaller IB, higher β and fT. The peak IIP3 and CG values obtained in the mixers fabricated in this lowestcost HCBT technology are IIP3 = 20.2 dBm and CG = 4 dB. The differences in IIP3 and CG of mixers with HCBTs with 2 TMAH etching times at IMIX = 50 mA 50 are 0.6 dB and 0.4 dB respectively. The additional measurements on the mixers without emitter degeneration resistor RE also showed small differences in CG and IIP3 (∆CG=1.1 dB and ∆IIP3 = 1.0 dB), suggesting that the differences in electrical characteristics at transistor level, have a relatively small impact at the circuit level in the case of high-linearity mixers. Furthermore, the overall mixer performance demonstrates the suitability of the lowest-cost HCBT technology with CMOS n-well region as n-collector for the wireless communication market. [1] M. Feng, S.-C. Shen, D.C. Caruth, J.-J. Huan, “Device technologies for RF Front-End circuits in next-generation wireless communications,” Proceedings of the IEEE, vol. 92, pp. 354-375, Feb 2004. [2] P. Deixler et al., "QUBiC4G: a fT/fmax = 70/100 GHz 0.25 µm low power SiGe-BiCMOS production technology with high quality passives for 12.5 Gb/s optical networking and emerging wireless applications up to 20 GHz," in Proc. BCTM, 2002, pp. 201-204. [3] H. S. Bennet et al., “Device and Technology Evolution for Sibased RF Integrated Circuits,” IEEE Trans. Electron. Devices, vol. 52, pp. 1235-1258, Jul 2005. [4] T. Suligoj et al., “Horizontal Current Bipolar Transistor (HCBT) with a Single Polysilicon Region for Improved High-Frequency Performance of BiCMOS ICs,” IEEE Electron Device Lett., vol. 31, pp 534-536, Jun 2010. [5] M. Koričić, J. Žilak, T. Suligoj, “Double-Emitter ReducedSurface-Field Horizontal Current Bipolar Transistor with 36 V Breakdown Integrated in BiCMOS at Zero-Cost,” IEEE Electron. Device Lett., vol. 36, pp. 90 – 92, Feb 2015. [6] B. Razavi, RF Microelectronics, 2nd ed., New York, USA, Paerson Education, Inc., 2012. [7] J. Žilak, M. Koričić, H. Mochizuki, S. Morita, T. Suligoj, “Impact of Emitter Interface Treatment on the Horizontal Current Bipolar Transistor (HCBT) Characteristics and RF Circuit Performance,” in Proc. BCTM, 2015, pp. 31-34. [8] M. Koričić, “Horizontal Current Bipolar Transistor Structures for Integration with CMOS Technology”, doctoral thesis, FER, University of Zagreb, Croatia, 2008. [9] B. Gilbert, “A precise four-quadrant multiplier with subnanosecond response”, IEEE Journal of Solid-State Circuits, vol. 3, pp. 365-373, 1968. [10] M. B. Judson, "Low-voltage front-end circuits: SA601, SA602," Philiphs Semiconductors, Aplication Note AN1777, Aug. 1997. [11] J. Rogers and C. Plett, "Mixers," in Radio Frequency Integrated Circuit Design, Boston, MA, USA: Artech Hounce Inc., 2003. MIPRO 2016/MEET Fully-integrated Voltage Controlled Oscillator in Low-cost HCBT Technology M. Koričić *, J. Žilak *, H. Mochizuki **, S. Morita ** and T. Suligoj* * University of Zagreb, Faculty of Electrical Engineering and Computing, Department of Electronics, Microelectronics, Computing and Intelligent Systems, Micro and Nano Electronics Laboratory, Zagreb, Croatia ** Asahi Kasei Microdevices Co. 5-4960. Nobeoka, Miyazaki, 882-0031, Japan marko.koricic@fer.hr Abstract - Design of cross-coupled voltage controlled oscillator in low-cost HCBT technology is presented. Beside the low-complexity front-end devices, only 2 metal layers are used and the passives are implemented in the available on-chip structures. Varactors are fabricated as pn-junctions by using the ion implantation from the technology. Symmetric inductors are fabricated by using the topmost metal layer. Since only 2 aluminum metal layers are available, small thickness of the aluminum layer and proximity of the silicon substrate limit the inductor quality factor. Varactor and inductor models for circuit simulations are developed by using the device and electromagnetic simulations, respectively, and are compared to measured characteristics of fabricated devices. I. INTRODUCTION Integration of bipolar transistors to the baseline coarser lithography Complementary-Metal-Oxide-Semiconductor (CMOS) processes has become an attractive technological approach for the extension of the technology applications, without a significant increase of fabrication cost [1]. A typical application is in the wireless communication circuits where benefits are gained by the improved high frequency and noise performance of bipolar transistors. Such integration should be cost-effective to meet the demands of very cost-sensitive market. Horizontal current bipolar transistor (HCBT) is integrated with standard 180 nm CMOS technology with only 2 or 3 additional lithography masks and a small number of additional processing steps [2], resulting in the very low-cost BiCMOS technology. At the same time, electrical performance is comparable to more expensive vertical bipolar transistors with implanted base. Furthermore, high-voltage devices are added to the technology at zerocost [3], [4], extending the portfolio of the available devices. RF mixer is so far the only published RF circuit implemented in HCBT technology [5]. In order to fabricate other RF front-end circuits, high quality passive components are needed, which: (i) are not available in the technology in its experimental and development phase, and (ii) increase the fabrication cost. In this paper, the design of fully integrated cross-coupled voltage controlled oscillator (VCO) is presented. The goal is to design VCO at the lowest possible cost by using only available frontThis work was supported by the Croatian Science Foundation under contract no. 9006. MIPRO 2016/MEET Figure 1. TEM cross-section of fabricated HCBT. end process steps needed for HCBT fabrication, i.e., in the bipolar-only version of the HCBT technology, with only the first two metal layers of standard CMOS interconnect processing. In this paper, design, modeling and fabrication of passive components in HCBT technology are done for the first time. II. VCO DESIGN IN HCBT TECHNOLOGY VCO is fabricated by using only the process steps for the fabrication of HCBT devices. TEM cross-section of fabricated HCBT is shown in Fig. 1. with marked transistor regions. Details about the process are given elsewhere [2]. Steep collector profile optimized for the performance of high-speed HCBT devices [6] is used for the fabrication of VCO. Electrical schematic of the designed VCO is presented in Fig. 2a and the photograph of fabricated chip in Fig. 2b. The core of the VCO consists of the cross-coupled differential pair biased by the tail current source (I0) realized by the simple current mirror. Bias current (IBIAS) is supplied off-chip. Resonant LC-tank of the circuit consists of a symmetric inductor and varactors which are fabricated by the front-end process steps available in the technology. Output buffers are implemented as two stages of the emitter followers in order to reduce the parasitic capacitance at the collectors of T1 and T2 and to provide capability of driving the output pad and the input impedance of the spectrum analyzer. Also, a separate biasing is used in order to monitor the power dissipation of the VCO core. In order to sustain the oscillations, the cross-coupled pair generates negative conductance at the collectors of T1 51 TABLE I. SIMULATED ELECTRICAL CHARACTERISTICS OF INTEGRATED INDUCTORS WITH DIFFERENT GEOMETRICAL PARAMETERS Wire width, W [µm] Wire spacing, S [µm] Number of turns, N Output diameter, D [µm] L@fosc, [nH] Q@fosc Qmax L@f(Qmax), [nH] Rp@fosc, [Ω] I0@Voutpp=0.6 Vpp, [mA] Inductor 1 12 3 3 150 1.12 1.487 3.6 (@10 GHz) 1.293 26.2 12.4 Inductor 2 5.5 0.8 10 170 10 1 1 (@2.3 GHz) 10 157 1.5 a) b) Figure 2. a) Electrical schematic of designed cross-coupled VCO. b) Chip photograph. and T2, which replenishes the energy lost in the resistive component of LC-tank. The transconductance of transistors (gm) should be high enough to satisfy the condition: 1 gm ≥ Rp (1) where Rp is the equivalent parallel resistance of the inductor. In practice, the gm is chosen higher than the marginal case in (1), to allow complete current steering and higher output voltage swings. As a design constraints we choose: a) tail bias current I0 smaller than 2 mA in order to have power dissipation less than 10 mW including bias circuitry and excluding output buffers, b) output voltage swing around 0.6 Vpp in order to have complete current steering in the crosscoupled pair, c) central frequency fc=2.5 GHz and d) tuning range of 10% of the central frequency, i.e., ∆f=250 MHz. In the case of complete current steering in the differential pair, output voltage swing is [7]: Vout , pp = 4 I 0 Rp π (2) where Rp is equivalent parallel resistance of the resonant tank. In the first approximation, it is defined by the inductor parameters at the frequency of oscillation: 1  R p = 2π f osc L  + Q  Q  52 (3) Figure 3. Layout of the symmetric inductor used in the design of VCO. where fosc is the frequency of oscillation, L the inductance and Q the inductor quality factor. A high Q of the inductor is desirable due to smaller losses, which reflects to smaller power consumption. Furthermore, a phase noise, an important oscillator parameter, is proportional to 1/Q2 and it benefits from usage of high quality inductors. A. Inductor design and modeling Symmetric inductors shown in Fig. 3 are used in the LC-tank of VCO. Inductors are fabricated by using only 2 aluminum layers of 0.5 µm thickness. Since the sheet resistance of these layers is rather high, wide metal lines are used in order to minimize wire resistance and improve Q-factor. On the other hand, this increases the capacitive coupling to substrate limiting the maximum achievable Q due to substrate losses. Additionally, wider metal lines require larger spacing between the lines due to reliability of the interconnect processing, yielding a smaller mutual and hence the overall inductance. Furthermore, total inductor area, which is usually specified in terms of an output diameter, increases and the achievable inductance is limited. Results obtained by electro-magnetic (EM) simulations for 2 inductors realized with wide and moderately wide metal lines and with comparable output diameter (D) are shown in Table 1. The equivalent parallel resistance (Rp) is calculated at a target oscillation frequency (fosc=2.5 GHz) from (3) and the tail bias current (I0) for desired output voltage swing Voutpp=0.6 Vpp is MIPRO 2016/MEET Figure 4. Compact model of the symmetric inductor extracted for the use in the circuit simulation. Figure 6. Varactor: a) 2D device simulation model, b) simple π-model of capacitances. Due to symmetry, half of the structure is simulated. frequency divider is chosen to put a focus only on the performance of VCO. Figure 5. Results of the optimization used for the extraction of inductor model parameters from Fig. 3. TABLE II. PARAMETERS OF THE INDUCTOR COMPACT MODEL OBTAINED FROM THE OPTIMIZATION CF, [fF] 203 Rsub, [Ω] 602 Ls, [nH] 4.44 Cox1, [fF] 119 Rs, [Ω] 36 Csub1, [fF] 3.1e-6 Cox, [fF] 44.6 Rsub1, [Ω] 1506 Csub, [fF] 1.2e-9 calculated by combining (2) and (3). It can be seen that it is difficult to obtain a high Q inductor at frequency as low as 2.5 GHz. In the case of the inductor 1, Q=1.487, which is higher compared to the inductor 2. However, due to a larger width of the metal lines (W), a smaller length of the wire with comparable D, results in a much smaller inductance of the inductor 1. This translates to much higher I0 and increased power dissipation. The inductor 2 is designed to have the peak Q-factor around the targeted fosc=2.5 GHz. Due to the higher inductance, the I0 is within the specification of the power dissipation for the given output voltage swing. Therefore, the inductor 2 is used for the design of VCO. It should be noted that in the case of the inductor 1, a moderately high Q of 3.6 is obtained at f=10 GHz. Since the higher Q can be obtained at higher frequencies, VCO can be designed to operate at a higher frequency and then a frequency divider can be used to obtain the specified lower frequency. For example, if the inductor 1 is used for the design of VCO operating at 10 GHz and Voutpp=0.6 Vpp from (2) and (3) we can calculate the required tail current to be I0=1.5 mA. Furthermore, the phase noise performance would be improved as explained earlier. However, design of VCO at f=2.5 GHz without MIPRO 2016/MEET S-parameters obtained by the EM simulations are used to extract a compact model of the inductor, which is shown in Fig. 4. Optimization goals are defined as absolute differences between S11, S12 and S22 obtained from the EM simulation and the ones obtained from the compact model from Fig. 4 in the frequency range from 1 GHz to 8 GHz. Relative difference after the optimization is shown in Fig. 5 showing the error lower than 2 % in the frequency band of interest around 2.5 GHz, indicating the suitability of the model for the circuit simulation. The parameters of the compact model obtained by the optimization are listed in Table 2. B. Varactor design and modeling In order to tune the frequency of the VCO, variable capacitors are needed. Frequency of oscillation is: f osc = 1 2π L(C p + Cv ) (4) where Cv is variable capacitance and Cp is overall parasitic capacitance at the output node, including the collectorbase, the base-emitter and the collector-substrate capacitance of the cross-coupled pair as well as the buffer input capacitance, the inductor parasitic capacitance and interconnect capacitance (see Fig. 2a). Equation (4) is used to set the value of the varactor minimum and maximum capacitance for the target central frequency (fc) and the tuning range (∆f). Frequency tuning is done by reverse biased diodes, which are fabricated by using the ion implantation for the fabrication of the n-collector (nhill in Fig. 1) and the p+ extrinsic base of HCBT. Electrical characteristics and the circuit model of the diode are obtained by the TCAD device simulations. Doping profiles under the extrinsic base, which are measured by Secondary Ion Mass Spectrometry (SIMS) are loaded in the device simulator. Cross-section of the simulated 53 Figure 7. Measured SIMS profile used in the device simulation. Doping profile at the cutline VCL from Fig. 5. varactor is shown in Fig. 6a and the profiles used in the simulations obtained at the cutline VCL are shown in Fig. 7. AC simulations are performed at different bias points in order to obtain the capacitance-voltage (CV) characteristics. Capacitances of the structure from Fig. 6a can be represented by a simple π-network shown in Fig. 6b. It consists of two junction capacitances: cathodeanode capacitance (CjCA) and cathode-substrate capacitance (CjCSUB) as well as parasitic anode-substrate capacitance (CASUB), which is shown to be negligible from the simulation results. In SPICE-like programs the pnjunction capacitance in is modeled by: C jB = C jB 0  Vapp  1 +  Vbi   M (5) where CjB0 is zero-bias capacitance, Vbi built-in potential, M grading coefficient and Vapp voltage applied between the cathode and the anode. SPICE model parameters are extracted from the device simulation results which are then scaled in the circuit simulations to obtain target fc and ∆f. Finally, the actual physical design of varactor layout is carried out with the area defined by the circuit simulation results. Fitting of the SPICE model to the device simulation after scaling is shown in Fig. 8 The quality factor of varactors (QC) is important for total quality factor of resonant LC-tank. The total quality factor is: 1 1 1 = + QTOT QL QC (6) where QL and QC are quality factors of the inductor and the varactor, respectively. QC depends on the series resistances and hence is difficult to model by the device simulations. In the physical design of the circuit layout, p+ and n+ regions from Fig. 6a are kept as close as possible with the constraint not to have too low breakdown voltage between the cathode and the anode. The cathode and the anode regions are realized as a long thin slices, which yields slightly larger series resistances (i.e., lower QC) compared to the case when n+ rings are formed around smaller p+ square-shaped regions. The structure with slices is chosen because it is dominated by 54 Figure 8. Comparison of varactor CV-characteristics obtained by the device simulations and from the extracted SPICE model. TABLE III. TRANSISTOR SIZES USED IN VCO DESIGN Transistor TB1 TB2 T1 T2 No. of unit transistors 1 8 8 8 TABLE IV. SIMULATED ELECTRICAL CHARACTERISTICS OF VCO Cps=80 fF Parasitic capacitance at the output Cps=0 VCC, [V] 3.3 3.3 IBIAS, [µA] 200 200 I0, [mA] 1.6 1.6 Minimum frequency, fmin, [GHz] 2.481 2.292 Maximum frequency, fmax, [GHz] 2.818 2.575 Tuning range, ∆f, [MHz] 337 283 Central frequency, fc, [GHz] 2.650 2.434 Output signal power, Pout, [dBm] 0.81 -0.47 Phase noise @100kHz offset, PN [dBc/Hz] -73.5 -73.2 the cross-section from Fig. 6a and is basically 2D structure, which is better predicted by the 2D device simulations. C. VCO design VCO is designed by using the inductor and the varactor models described in the previous sections. The size of the varactors is scaled in order to obtain the desired oscillation frequency (fosc) and the tuning range (∆f). HCBT model used in the design is previously developed standard Gummel-Poon model, which is already used in the design of RF mixers [5]. Unit size transistors are used and are connected in parallel depending on the required collector current. Transistor sizes in terms of number of parallel unit transistors are given in Table 3. The bias current supplied of the chip in Fig. 2a is IBAS=200 µA. The size ratio of TB1 and TB2 sets the tail bias current I0=1.6 mA. The sizes of the transistors T1 and T2 are chosen the same as in the tail current source in order to avoid large current density when current is completely steered in one of the branches. The results of simulated electrical performance of VCO are summarized in Table 4. Table includes the column with assumed parasitic capacitance at the output node which is used in simulations (Cps). It includes the wire capacitance and all possible discrepancies in capacitance in transistor and varactor model. Its value is set to fit the measured data. MIPRO 2016/MEET Figure 9. Comparison of measured and modeled inductor Q-factor. Figure 10. Comparison of measured and modeled inductance of the inductor. III. MEASUREMENT RESULTS Comparison of measured inductor Q-factor and inductance up to 8 GHz with the ones obtained by EM simulations and compact model used in the design phase are shown in Figs. 9 and 10, respectively. Good fitting between EM simulation and the compact model can be observed as expected from the optimization results shown in Fig. 5. Slightly higher value of inductance and Q-factor is obtained in measurements. De-embedding of the measurement results is done by using open-pad structure only, and portion of the wire connecting inductor to the pad adds small series inductance. Furthermore, the stack of the substrate, metal and dielectric layers and their electrical parameters which are set as the input to the EM simulator might introduce some error. Nevertheless, a satisfactory fit of the measured by modeled electrical characteristics is achieved. Comparison of the CV characteristic obtained by the SPICE model used in the simulations and the measured Sparameters of fabricated varactor is shown in Fig. 11. Good fitting of the measured CjCA by the model is accomplished. The measured CjCSUB is underestimated in the simulation due to the underestimated peripheral component, which is not used in the 2D device simulation structure employed for the model development (see MIPRO 2016/MEET Figure 11. Comparison of measured and modeled CV-characteristics of the varactor Figure 12. Measured quality factors of varactor at the anode and the cathode terminals at frequncy f=2.5GHz. Fig. 6a). On the other hand, the voltage dependency is well reproduced and model can be easily calibrated by adjusting the parameter CjB0 in (5). The extracted quality factor of the varactor (QC) from the measured Sparameters at 2.5 GHz is shown in Fig. 12. The QC measured at the cathode plays a more significant role since it is connected to the output node. QC above 10 is achieved for all applied control voltages, which is rather low value, but according to (6), LC-tank quality factor (QTOT) is still dominated by the inductor in our design. Measured output signal power and frequency of oscillation dependency on control voltage (VCTRL) at IBIAS=200 µA of fabricated VCO are shown in Fig. 13 and the results are summarized in Table 5. Measurements are made directly on-chip by using multi-contact probes. Maximum and minimum fosc for VCTRL of 0 V and 3 V are 2.53 GHz and 2.295 GHz, respectively, resulting in the tuning range (∆f) of 235 MHz, which is 9.75 % of the central frequency (fc=2.41 GHz). Measurement results fit well to the results of the simulated VCO with assumed parasitic capacitance Cps=80 fF reported in Table 4. The assumed Cps takes into account the discrepancy of CjCSUB in Fig. 11 as well as the wire capacitance at the output node. The measured output power shows discrepancy compared to the simulation results, which is attributed to the actual buffer output impedance driving the 50 Ω input 55 TABLE V. MEASURED ELECTRICAL CHARACTERISTICS OF VCO 3.3 VCC, [V] IBIAS, [µA] 200 I0, [mA] 1.74 Minimum frequency, fmin, [GHz] 2.295 Maximum frequency, fmax, [GHz] 2.53 Tuning range, ∆f, [MHz] 283 Central frequency, fc, [GHz] 2.434 Output signal power, Pout, [dBm] -3.3 Phase noise @100kHz offset, PN [dBc/Hz] Not available 100 kHz offset from the carrier frequency (see result for Cp=80 fF in Table 4), which is rather high value. Since the 2 it can mainly be phase noise is proportional to 1 QTOT improved by incorporating high-Q inductors and varactors. Figure 13. Measured oscillation frequency and output signal power dependency on control voltage of fabricated VCO. IV. CONLUSION Design of the cross-coupled voltage controlled oscillator in low-cost HCBT technology is presented. Varactors and inductors are designed and modeled by device and electromagnetic simulations and fabricated by using the available process steps without altering the technology. Good agreement between modeled and measured electrical characteristics of fabricated passive components is accomplished. It is shown that the VCO performance is limited by the low Q-factor of the inductor, which is fabricated in the 2nd aluminum metal layer of standard CMOS interconnect. REFERENCES [1] Figure 14. Measured oscillation frequency and output signal power dependency on tail bias current of fabricated VCO. [2] impedance of the spectrum analyzer as well as to the reduction of the QTOT by the resistance of the interconnect wires that is not taken into account in the circuit design. Reduction of the Pout for larger VCTRL can be also partly attributed to the reduction of QTOT of the LC-tank. VCTRL is applied at the anode of the varactor, and since the cathode voltage is approximately VCC, the cathode-anode voltage is decreased when VCTRL is increased resulting in the decrease in QC (see Fig. 12). Hence, according to (6), QTOT decreases. Measured Pout and fosc dependencies on the tail bias current (I0) of the cross-coupled pair for two marginal control voltages (VCTRL) which define the tuning range, are shown in Fig 14. Frequency tuning range and Pout are relatively constant for the I0 between 1.5 mA and 2.5 mA. Pout rises for larger I0, which agrees with (2), and then falls-off when transistors enter the high current regime. Direct phase noise measurement method by using the spectrum analyzer, which is available at the moment, is not suitable for characterization of free-running VCOs [8]. Therefore, the simulation results are given in this paper. Results show the phase noise of -73.2 dBc/Hz at the 56 [3] [4] [5] [6] [7] [8] H. S. Bennett, R. Brederlow, J. C. Costa, P. E. Cottrell, W. M. Huang, A. A. Immorlica, Jr., J.-E. Mueller, M. Racanelli, H. Shichijo, C. E. Weitzel, and B. Zhao, “Device and technology evolution for Si-based RF integrated circuits,” IEEE Trans. Electron Devices, vol. 52, no. 7, pp. 1235–1258, Jul. 2005. T. Suligoj, M. Koričić, H. Mochizuki, S. Morita, K. Shinomura, and H. Imai, “ Horizontal Current Bipolar Transistor (HCBT) with a Single Polysilicon Region for Improved High-Frequency Performance of BiCMOS ICs,” IEEE Electron Device Lett., vol. 31, no. 6, pp. 534-536, June 2010. M. Koričić, T. Suligoj, H. Mochizuki, S. Morita, K. Shinomura, and H. Imai, “Double-Emitter HCBT Structure—A High-Voltage Bipolar Transistor for BiCMOS Integration,”, IEEE Trans. Electron Devices, vol. 59 , no. 12 pp. 3647 – 3650, Dec. 2012. M. Koričić, J. Žilak, T. Suligoj, “Double-Emitter ReducedSurface-Field Horizontal Current Bipolar Transistor With 36 V Breakdown Integrated in BiCMOS at Zero Cost,” IEEE Electron Device Lett., vol. 36, no. 2, pp. 90-92, Feb. 2015. J. Žilak, M. Koričić, H. Mochizuki, S. Morita, T. Suligoj, “Impact of Emitter Interface Treatment on the Horizontal Current Bipolar Transistor (HCBT) Characteristics and RF Circuit Performance,” in Proc. BCTM, 2015, pp. 31-34. T. Suligoj, M. Koričić, H. Mochizuki, S. Morita, K. Shinomura, and H. Imai, “Collector Region Design and Optimization in Horizontal Current Bipolar Transistor (HCBT),” in Proc. IEEE Bipolar / BiCMOS Circuits and Technology Meeting, 2010, pp. 212-215. Behzad Razavi, “Oscillators,” in RF Microelectronics, 2nd edition, Boston, MA, USA, Prentice Hall, 2011. Keysight Technologies, “Phase Noise Measurement Solutions,” Selection guide, USA, 5990-5729EN, Aug. 2014. MIPRO 2016/MEET Variable-Gain Amplifier for Ultra-Low Voltage Applications in 130nm CMOS Technology Daniel Arbet, Martin Kováč, Lukáš Nagy, Viera Stopjaková and Michal Šovčı́k Department of IC Design and Test Faculty of Electrical Engineering and Information Technology Slovak University of Technology Bratislava, Slovakia e-mail: daniel.arbet@stuba.sk Abstract—The paper deals with design and analysis of a variable-gain amplifier (VGA) working with a very low supply voltage, which is targeted for low-power applications. The proposed amplifier was designed using the bulk-driven approach, which is suitable for ultra-low voltage circuits. Since the power supply voltage is less than 0.6 V, there is no risk of latchup that is usually the main drawback of bulk-driven topologies. The proposed VGA was designed in 130 nm CMOS technology with the supply voltage of 0.4 V. The achieved results indicate that gain of the designed VGA can be varied from 0 dB to 18 dB. Therefore, it can be effectively used in the many applications such as automatic gain control loop with ultra-low value of supply voltage, where the dynamic range is the important parameter. I. I NTRODUCTION Advanced nanotechnologies enable very-large-scale integration and bring the opportunity to design ultra low-power analog and mixed-signal integrated systems. In these modern nanoscale CMOS processes, also the power supply voltage is continuously scaled down. Moreover, the threshold voltage of MOS devices does not decrease with the same slope as the supply voltage. Trends towards ultra-low value of power supply voltage can effectively increase battery life of portable electronics, biomedical implanted devices, hearing-aid devices, etc. One of the most important building blocks of analog integrated circuits (IC) is a variable gain amplifier (VGA), which is used in many applications in order to stabilize the voltage amplitude of a signal at its output. VGA is usually employed in an automatic gain control (AGC) circuit to maximize the dynamic range of the whole system. Since the proposed VGA is meant to be used in low-voltage applications, the possibility of using standard amplifier topologies is limited (because the supply voltage is usually less than 1 V). Standard VGA topologies are ussually based on the conventional differential structure [1]–[3]. Since there are four or more stacked transistors, these topologies require a high value of the supply voltage and it is not suitable for ultra-low voltage applications. One possible topology of a VGA suitable for a low supply voltage is based on the pseudo-differential difference amplifier (PDDA). PDDA topology can effectivelly increase the input and output voltage ranges. On the other hand, disadventages of the PDDA include high sensitivity to process and temperature variations (PVT) and low value MIPRO 2016/MEET of CMRR (Common-Mode Rejection Ratio) parameter due to a missing tail source. Therefore, ussually Common-Mode Feedback (CMFB) or Common-Mode Feedforward (CMFF) circuit is employed to stabilize the operational point and to increase CMRR of the PDDA [4], [5]. In the case when the PDDA is based on the two common-source amplifiers [4], the input voltage range is limited by the treshold voltage of the input transistors. In order to increase the input voltage range, a rail-to-rail input stage has to be used. For this purpose, some of unconventional design techniques is needed. In order to overcome this limitation and also to increase the input voltage range of VGA, so-called bulk-driven technique (MOS devices are controlled by bulk instead of gate) was used to design the proposed VGA. Bulk-driven approach has been employed to design a number of analog building blocks [7], [8]. In this paper, design of ultra low-voltage bulk-driven VGA based on pseudo-differential topology is presented and its main parameters are analysed. In Section II, the proposed topology of VGA is presented. Performed small-signal analysis of the VGA is then described in Section III. Achieved results are presented in Section IV. In Section V, the achieved parameters of the developed VGA are summarized and shortly discussed. II. P ROPOSED VARIABLE G AIN A MPLIFIER A. VGA general description CTRL +IN VGA +OUT21 + _ PDDA + _ -OUT22 -IN CMFF CMFB Fig. 1. Block diagram of the proposed VGA Fig. 1 shows the block diagram of the proposed VGA. The main block representing the VGA core is a bulk-driven fully PDDA (FPDDA) with the gain control input terminal. FPDDA is based on two common-source amplifiers, where a bulk-driven input transistor is used. In order to stabilize the operational point and increase the CMRR of the proposed 57 VGA, CMFB and CMFF circuits were employed. To achieve good stability of the CMFB loop, frequency compensating capacitors have been used (not depicted in Fig. 1). B. Proposed Pseudo-Differential VGA topology Schematic diagram of the proposed ultra-low voltage VGA core circuit is depicted in Fig. 2. The VGA was designed in 130 nm CMOS technology and can reliably operate at the supply voltage of 0.4 V. VDD M7 -OUT M8 +OUT CMFB_out VDD M6 M5 drop on diode-connected transistor M13) is stable. Difference between values of common-mode voltage at the VGA inputs will change the current through transistor M11. This current is mirrored by transistor M12 to transistor M13 and its voltage drop will change. This principle was used to regulate (gate) bias voltage for input transistors M1-M4 when the common-mode voltage at the inputs of VGA is changed due to PVT. Finally, voltage drop on the transistors M1 and M2 was used as bias voltage for input transistor in the CMFF circuit (M9 and M10). Thank to this biasing technique, the VGA sensitivity to common-mode voltage as well as PVT is reduced. M12 CTRL +IN cmff_b1 M3 M1 cmff_b2 CMFF_out M11 CMFF_out Gain control M4 -IN M13 +IN -IN cmff_b1 M9 M2 cmff_b2 M10 Fig. 3. Schematic diagram of CMFF circuit Fig. 2. Schematic diagram of the bulk-driven FPDDA-based VGA The proposed topology is based on the cascode FPDDA, where bulk-driven (BD) input MOS transistors were used. Input BD transistors are used to obtain the rail-to-rail input voltage range. Unfortunately, bulk transconductance of a MOS transistor is given by the following expression gmb ≈ 0.2gm , which means that the gain and gain-bandwidth product (GBW) of the proposed VGA will be decreased. On the other hand, the circuits designed by the bulk-driven approach are useful for ultra low-voltage and low power applications. Generally, gain of the VGA can be varied by controlling its total conductance or the total output resistance. Thus, in our case, transistors M5 and M6 were employed to control the VGA gain. Voltage change at the VGA control terminal (CTRL) causes a change in current flowing through the input transistors M1 and M2, and thereby regulates their transconductance. Thus, changes in transconductance of the input transistors lead to variation of the total VGA gain. Therefore, the total gain of VGA is directly proportional to transconductance of the gain control transistors (gm5 and gm6 ). Additionally, the gain control transistors together with the input transistors represent a cascode stage and therefore, gm5 and gm6 influence also the output resistance of VGA. Detailed small-signal analysis of the designed VGA is described in Section III. C. CMFF circuit In order to adjust the bias voltage (CMFF out) of input transistors (M1-M4), a CMFF circuit depicted in Fig. 3 was used. Transistors M9 and M10 represent the input transistors of the CMFF circuit, which follow the signal at the input of VGA. In the case of differential input signal, there is no change in the current flowing trough the diode-connected transistor M11, and voltage at the CMFF output (voltage 58 D. CMFB circuit In general, a CMFB is needed to prevent the output voltages from saturating to one of the power rails when the input common-mode voltage varies or there is a change in the circuit operating point due to VGA control voltage. A CMFB usually consist of a common–mode voltage detector and an error amplifier, as presented in [9]. In general, this approache suffers from chip area overhead in terms of resistors implementation and separation function of two main active blocks. In addition, potential incompatibality of conventional structures with ultra–low voltage application is increased (especially with the supply voltage value deep below 1 V), which results in frequently used low voltage realizations like current-based operation CMFB (CB–CMFB) or its improved version [10] combining the common–mode detection and gain functionality if comapared to many published common–mode detection only realizations [9]. The novel self-biased bulk driven CMFB circuit that combines common–mode voltage detection and error amplification capabilities is introduced (Fig. 4), and used as the most important part of the low-voltage VGA based on pseudo–differential structure. The main idea is slightly similar to CB–CMFB, where two differential pairs (N 2x, N 3x) with purposely degraded current sources (N 1x) of the same polarity are connected in parallel but in our design, current mixing is accomplished in the output node CM F B out across two current mirrors (N 4x, N 5x). Degraded tail current sources (diode-connected transistors in this case, in opposite to CB–CMFB) was employed to improved entire common-mode gain of the VGA. We have to note that blue dashed feedback consisting of these two transistors can works as a negative feedback (across N 3x) or a partially positive feedback (across N 4x, N 5x). Additionally, the output node MIPRO 2016/MEET + gm5Vgs5 rds5 Vgs5 + 1 2 Vin 1g V 2 mb1 in rds1 Vbs1 - + 1g V 2 m3 in rds4 rds7 1 2 Vout Vds1 - Fig. 5. Small-signal model of proposed VGA Using the small-signal model (Fig. 5), the total VGA transconductance Gm can be expressed as follows: Gm = gmb · Fig. 4. Schematic diagram of the bulk-driven CMFB: original design (black), modification (red), N–type realization is jammed between two current sources (N 3x, N 5x) with high output impedance. Therefore, further gain improvement can be expected, which is in the case of inherited low gain bulk–driven configuration, the main effort. Finally, similar to the PDDA, also self-biased approach was employed into CMFB design that slightly contributes to improvement of the entire PSSR of VGA and eliminates the need for a separate bias circuit. Originally, the designed CMFB needs only one bias voltage for transistors N 2x and N 3x that was in case of self-biased configuration formed by antiseries diode like connected transistors N 6x and N 7. This forms two additional feedback loops, where green dashed and red dashed loop introduces partially positive and negative feedback, respectively. Therefore, design of the CMFB circuit must be carefully taken and rather investigated as part of the whole VGA, especially, if suitable common-mode phase margin requirements have to be fulfilled. To improve its common-mode and differential-mode performance, the topology was extended by a couple of bridged transistors N 8x that lightly balance current between two differential pairs. III. S MALL -S IGNAL A NALYSIS A. Small-Signal Analysis of VGA For better understanding of the gain control technique used in the developed VGA, the low-frequency small-signal analysis has been performed. The small-signal model of the proposed VGA (without CMFB circuit and second stage) is shown in Fig. 5. Since the selected topology is symmetrical, in the small signal analysis, we can consider half-circuit model and evaluate impact of +Vin and −Vin separately, and then simply sum the contribution of both parts. The total low-frequency gain of the proposed VGA circuit can be written as A = Gm · Rout , (1) where Gm and Rout represent the total transconductance and the total output resistance of the proposed VGA, respectively. MIPRO 2016/MEET gm5 + gds5 − gmb3 gds1 + gm5 + gds5 (2) where gmb1 and gmb3 are bulk transconduntance of transistors M1 and M3, gm5 is a transconductance of transistor M5, while gds1 and gds5 are output transconductance of transistors M1 and M5. If we consider that gm3 >> gds5 , then equation 2 can be rewritten as follows: Gm1 = gmb1 · gm5 − gmb3 = gmb1 · K − gmb3 (3) gds1 + gm5 K= 1 gm5 = gds1 gds1 + gm5 1+ gm5 (4) One can observe that the total transconductance of VGA (Gm ) can be controlled by transconductance gm5 , which depends on the control voltage (CTRL). Coefficient K can vary in the range from 0 to 1. If gm5 << gds1 the coefficient K form equation 4 is become zero and the total transconductance of VGA will be equal to −gmb3 . In the opposite case (when gm5 >> gds1 ), coefficient K is equal to 1 and Gm1 will be equal to gmb1 − gmb3 . This meas that by increasing the transconductance gm5 , the total transconductance Gm1 is decreased. Besides, the total transconductance Gm also depends on gds1 /gm5 ratio. If numerator and denominator of the ratio are divided by Ids1 , the following expression can be written: gds1 gds1 VCT RL − Vth1 I = , = gds1 m5 gm5 2 · (VA1 + Vds1 ) Ids1 (5) where VCT RL is the control voltage of VGA, and Vth1 , VA1 and Vds1 is the threshold voltage, Early voltage and voltage between source and drain of transistor M1, respectively. From equation 5, it can be observed that gds1 /gm5 ratio, which controls the total transconductance Gm , depends on the control voltage of the proposed VGA. Using the small-signal model depicted in Fig. 5, the output resistance Rout of the proposed VGA is expressed as Rout1 = [(1 + gm5 rds1 )rds5 + rds1 ]||rds7 ||rds4 , (6) 59 where rds1 , rds4 , rds5 and rds7 is the output resistance of transistor M1, M4, M5 and M7, respectively. Since the term gm5 rds1 rds5 > rds1 , equation 6 can be simplified to Rout = gm5 rds1 rds5 || rds4 ||rds7 | {z } | {z } A (7) B In equation 7, terms A and B represent a parallel combination of two resistances. From the VGA design point of view, if term B is maximized, the total resistance Rout of the VGA will depend on term A. In such a case, it is possible to vary Rout by transconductance gm5 that is proportional to the control voltage of the VGA. B. Small-Signal Analysis of CMFB To investigate the low frequency common–mode gain of the CMFB circuit (investigation of error amplifier character), the exhausted small–signal model has been derived (Fig. 6), where the individual parameters are also listed. 2.5 dB. Additionally, one can observe that using gate–driven approach, common–mode gain of the novel self–biased CMFB can be improved further up to 4 − 5times. IV. S IMULATION RESULTS In this section, simulation results achieved for the designed VGA (including the CMFB and CMFF circuits) using the supply voltage of 0.4 V are presented. The results were obtained from Corner and Monte Carlo (MC) analyses, where the process variation as well as mismatch of devices were taken into account. Since the proposed VGA is suitable for low-frequency applications, there is an assumption that results obtained by post-layout simulation will not differ significantly. The frequency response of VGA for different values of the control voltage is shown in Fig. 7. It can be observed that for the control voltage of 0.1 V, the VGA gain of about 18 dB and GBW of 1.2 MHz were achieved. These parameters were obtained for the load capacitance of 1 pF. For higher values of the control voltage, the gain decreases down to 6 dB. It is important to note that the bandwidth (BW) does not change with the CTRL voltage, which is one of advantages of the proposed VGA. f -3 2 0 A V [d B ] 0 f -3 -1 0 V -2 0 V -3 0 gmb2 gm5 + gmb5 )( ) gds3 + gds5 gm4 + gmb4 (8) Equation 8 is based on the following presumptions: ! ! ! ! GM 1 = GM 2 ∧ RO1 = RO2 ∧ RO1  RO6 ∧ GM 4 RO6 = 1 (9) Thus, equation 8 reveals a few interesting observations: common–mode gain depends only on properties of the differential pairs and current mirrors with no influence of bias circuit and transistors N 1x. Such a degree of freedom can be useful for finding the solution to comply the sensitive restrictions (Eq.9), where the last one is the most difficult to fulfill. In our case, its value was approximately 1.029 in all corners that leads to inaccuracy of calculation up to cca 60 C T R L V Fig. 6. Low frequency small–signal model of CMFB AvCM F Bapp = ( = 1 7 3 .8 k H z f0 1 0 Exhausted result of close-loop common-mode gain of CMFB derivation is however, rigorous and too difficult to interpret or extract any handy information because of three existing internal feedback loops, as already discussed. Fortunately, if considering the operational modes of individual transistors (i.e. N 1 works like linear resistor, N 4, N 6, N 7 and N 2, N 3, N 5 represent diode-connected transistors and current sources, respectivelly) approximated expression of common-mode gain of CMFB can be derived: d B 1 1 0 C T R L C T R L d B f0 d B = 1 .2 M H z = 1 7 3 .6 k H z d B = 2 8 9 .3 k H z = 0 .1 V = 0 .2 5 V = 0 .3 V 1 0 0 1 k 1 0 k F re q u e n c y [H z ] 1 0 0 k 1 M 1 0 M Fig. 7. Frequency response of the VGA As can be observed from Fig. 8, the VGA gain is varied in the range of the control voltage from 0 V to 0.33 V but it is linear only in the range from 0.3 V to 0.33 V. In the whole range, gain varies from 18 dB to 0 dB. The gain change (for low values of CTRL) caused by process variations is 3.34 dB, which represents a significant result taking into account that the proposed VGA was designed in 130 nm technology. On the other hand, variance of gain for high values of CTRL is substantially higher because the slope of the characteristics is changed by the different process corner. The worst case can be observed in fast-slow (FS) and slow-fast (SF) corner. Although the supply voltage is very low, gain is relatively stable for low values of CTRL from temperature point of view. For high values of CTRL, deviation of gain caused by temperature increases because the bias voltage of input transistors is temperature dependent. This deviation can be reduce using temperature compensation of the bias voltage. The important pameter of the PDDA is a CMRR. Fig. 9 shows variations of the CMRR parameter obtained from MC analysis. It can be observed that CMRR varies in the range MIPRO 2016/MEET 2 0 1 5 [d B ] 1 0 A V 5 fa s fa s s lo s lo ty p 0 -5 -1 0 0 .1 0 t n m t n m n m w n m ic a l w o s o s o s o s fa s t s lo w - fa s - s lo 0 .1 5 0 .2 0 p m o s p m o s t p m o s w p m o s V C T R L [V ] 0 .2 5 0 .3 0 0 .3 5 Fig. 8. VGA gain vs the control voltage in all process corners from -40 dB to -85 dB, while the mean value is about -60 dB. This is very good results generally, which was achieved thank to both CMFF and CMFB circuits being employed. CMFB, we individually investigated its performance. Two most important characteristics are depicted in Fig. 11 and Fig. 12. Fig. 11 shows the frequency response across all corners, where the load capacitance of 0.5 pF was considered and the reference voltage V REF was set to Vdd /2. The common–mode gain between 17.38 dB and 21.29 dB was achieved that is however, further increased by gmM 7 and gmM 8 transconductances. The achieved value of common-mode gain is sufficient to hold the common-mode output voltage approximately in the range from 188.5 mV to 199.8 mV in all corners. However, the phase of common–mode gain transfer function (not shown) moves in the range from 0 to 360 deg and therefore, compensation task could be quite challenging. One can also observe that the minimum bandwidth of 23.43 kHz was achieved at slow-slow (SS) corner. This value is sufficient for biomedical and audio applications, where the bandwidth of approximately 1 kHz and 20 kHz is required, respectively. 2 0 0 1 5 0 7 .8 1 d B 2 0 [d B ] 1 5 1 0 C M F B 1 0 0 5 A v N u m b e r o f S a m p le s 2 5 M e a n = 6 1 .3 S td D e v = 9 .7 7 0 5 0 ty p fa s s lo s lo fa s -5 0 -9 0 -8 0 -7 0 C M R R -6 0 -5 0 [d B ] -1 0 -4 0 Fig. 9. Variations of CMRR evaluated by MC analysis Since the self-biased technique was used in the proposed VGA, the PSRR parameter is also improved. Fig. 10 shows the variation of the PSRR parameter, where the process variation and mismatch of all devices were taken into account. One can observe that the PSRR parameter varies in the range from -75 dB to -25 dB, while the mean value of PSRR is about -45 dB. 2 0 0 m o s p m o s p m o s p m o s 1 0 0 1 k F re q u e n c y [H z ] 1 0 k 1 0 0 k 1 M Fig. 12 shows the CMFB output control voltage VCM F B out versus the VGA output voltage. It is obvious that CMFB is characterized by good performance in terms of the output control voltage range, where CMFB exhibits 0.12 V linear range behavior at 0.4 V supply voltage. We have to also note that FS corner belongs to the worst case corner in terms of common-mode gain and CMFB output voltage operation region. ty p fa s s lo s lo fa s 0 .3 0 0 .2 5 c m fb _ o u t [V ] 1 0 0 ic a l t n m w n m w n m t n m o s - fa s t o s - s lo o s - fa s o s - s lo w p m w p t p m p m o s m o s o s o s 0 .2 0 0 .1 2 V 0 .1 5 V N u m b e r o f S a m p le s 1 0 0 .3 5 5 0 o s - fa s t p o s - s lo w o s - fa s t o s - s lo w Fig. 11. Frequency response of CMFB M e a n = -4 8 .9 1 S td D e v = 1 0 .7 1 5 0 1 ic a l t n m w n m w n m t n m 0 .1 0 0 -8 0 -7 0 -6 0 -5 0 P S R R [d B ] -4 0 -3 0 -2 0 Fig. 10. Variation of PSRR obtained from MC analysis Since the designed VGA consists a novel bulk-driven MIPRO 2016/MEET 0 .0 5 0 .0 0 0 .0 0 .1 V 0 .2 c m f b _ in [V ] 0 .3 0 .4 Fig. 12. CMFB output voltage vs. common output voltage of VGA 61 15 400 24 20 500 Adm [dB] 2,25 BW [KHz] GBW [MHz] IDD [μ A] 3,00 30 Main parameters of VGA and CMFB circuits obtained from Corner Analysis are summarized in Fig. 13 and Fig.14, respectively. In the case of the proposed VGA, FF corner represents the worst case. In FF corner, the gain of 15.98 dB and current consumption of 19.57 µA were achieved. A summary of all characteristic parameters i.e. static current consumption, BW, GBW and the maximum ripple of CMFB output voltage 4VCM F B out driven by ± 200 mV differential output voltage of CMFB, can be found in Fig. 14. Because diode-connected transistors were used instead of current–mode operation transistors, one can observe substantial current fluctuation across individual corners affecting also BW and GBW. The worst case static consumption of approximately 12.5 µA was observed for fast-fast (FF) corner. Since the proposed VGA is based on two input differential pairs and uses the cross-coupled topology, the total current consumption does not depend on the control voltage CTRL, which is considered as advantage. 5 12 6 0 SF FS 0 SS 0,00 FF 100 0,75 0 200 10 18 300 1,50 TT Corners 120 100 10 8 80 60 SF 2 20 0 0 SS 0 FF 200 TT 50 0 40 400 100 6 600 4 Vcmfb_out [mV] 150 Δ 800 BW [kHz] GBW [kHz] IDD [μ A] 200 12 Fig. 13. Summary of the VGA main parameters across all corners. FS Corners Fig. 14. Summary of the CMFB main parameters across all corners. V. C ONCLUSION The novel VGA circuit based on PDDA topology for low-voltage applications, designed in 130 nm CMOS technology, was presented. As demonstrated, the designed VGA represents a building block for low-power and low-voltage applications, where the supply voltage of less than 0.6 V, differential signal processing as well as high dynamic range and low distortion are required. The future work will be 62 TABLE I M AIN PARAMETERS OF THE PROPOSED VGA Parameter VCT RL [V] Av tun. range [dB] Condition Min Typ Max VDD = 0.4 V 0 - 0.33 0 ÷ 0.33 0 - 18 Av Lin-in-dB [dB] 0.3 ÷ 0.33 0 - 8 GBW [Hz] CL = 1 pF 289k - 1.2M CL = 1 pF 173.6k - 173.8k CMRR [dB] VDD = 0.4 V −40 −60 −85 PSRR [dB] VDD = 0.4 V −25 −45 −75 Av , cmfb [dB] CL = 0.5 pF 17.4 20.1 21.29 GBW, cmfb [kHz] CL = 0.5 pF 290 570 820 BW, cmfb [kHz] Lin. op. range, cmfb [V] CL = 0.5 pF 20 50 90 - 0.12 0.13 0.15 Total power [µW] VDD = 0.4 V 1.39 3.45 7.84 BW [Hz] led towards design of a logarithmic function generator in order to increase the linear-in-decibel range as well as on further improvement of the main VGA parameters in terms of ExG requirements. ACKNOWLEDGMENT This work was supported by the Slovak Republic under grants VEGA 1/0762/16 and VEGA 1/0823/13. R EFERENCES [1] H. D. Lee, K. A. Lee, and S. Hong, “A Wideband CMOS Variable Gain Amplifier With an Exponential Gain Control,” Microwave Theory and Techniques, IEEE Transactions on, vol. 55, no. 6, pp. 1363–1373, June 2007. [2] P.-C. Huang, L.-Y. Chiou, and C.-K. Wang, “A 3.3-V CMOS wideband exponential control variable-gain-amplifier,” in Circuits and Systems, 1998. ISCAS ’98. Proceedings of the 1998 IEEE International Symposium on, vol. 1, May 1998, pp. 285–288 vol.1. [3] T. Yamaji, N. Kanou, and T. Itakura, “A temperature-stable CMOS variable-gain amplifier with 80-dB linearly controlled gain range,” Solid-State Circuits, IEEE Journal of, vol. 37, no. 5, pp. 553–558, May 2002. [4] A. Suadet and V. Kasemsuwan, “A 1 Volt CMOS Pseudo Differential Amplifier,” in TENCON 2006. 2006 IEEE Region 10 Conference, Nov 2006, pp. 1–4. [5] M. Shahabi, R. Jafarnejad, J. Sobhi, and Z. Daei Kouzehkanani, “A novel low power high CMRR pseudo-differential CMOS OTA with common-mode feedforward technique,” in Electrical Engineering (ICEE), 2015 23rd Iranian Conference on, May 2015, pp. 1290–1295. [6] F. Khateb and S. Vlassis, “Low-voltage Bulk-driven Rectifier for Biomedical Applications,” Microelectron. J., vol. 44, no. 8, pp. 642–648, Aug. 2013. [7] G. Raikos, S. Vlassis, and C. Psychalinos, “0.5 V bulk-driven analog building blocks,” International Journal of Electronics and Communications, vol. 66, no. 11, pp. 920 – 927, 2012. [8] J. Carrillo, G. Torelli, R. Prez-Aloe, and J. Duque-Carrillo, “1-V rail-to-rail CMOS OpAmp with improved bulk-driven input stage,” IEEE Journal of Solid-State Circuits, vol. 42, no. 3, pp. 508–516, 2007. [9] J. Carrillo, G. Torelli, M. Dominguez, R. Perez-Aloe, J. Valverde, and J. Duque-Carrillo, “A Family of Low-Voltage Bulk-Driven CMOS Continuous-Time CMFB Circuits,” Circuits and Systems II: Express Briefs, IEEE Transactions on, vol. 57, no. 11, pp. 863–867, Nov 2010. [10] F. Castano, G. Torelli, R. Perez-Aloe, and J. Carrillo, “Low-voltage rail-to-rail bulk-driven CMFB network with improved gain and bandwidth,” in Electronics, Circuits, and Systems (ICECS), 2010 17th IEEE International Conference on, Dec 2010, pp. 207–210. MIPRO 2016/MEET Relaxation Oscillator Calibration Technique with Comparator Delay Regulation J. Mikulić*, G. Schatzberger* and A. Barić** ams AG, Graz, Austria University of Zagreb/Faculty of Electrical Engineering and Computing, Zagreb, Croatia josip.mikulic@ams.com * ** Abstract – This paper presents an improved technique for the calibration of the relaxation oscillators with respect to the delay of the comparators. The drawbacks of the conventional topology for the relaxation oscillators are analyzed. Based on the analysis, the circuit modification which resolves the effects of the comparator delay in the trimming procedure is proposed. The simulations in ams 0.18μ CMOS technology exhibit more than 5x the improvement in the precision compared to the conventional topology, evaluated in the temperature range from -40 to 125 ○C. I. INTRODUCTION A stable clock reference is one of the basic building blocks of every digital and mixed-signal circuit. Together with the ever-increasing trend of the semiconductor industry, the clock references tend to be implemented as a full on-chip solution in the systems that require low power consumption and low production cost [1-5]. As a drawback compared to the crystal oscillators, they suffer from the reduced accuracy, most of the time being in the range of several percentage points [1]. Although some techniques have been adapted in order to minimize the influence of the process and the supply voltage variations [1-3], they rely on the stable references over the temperature, which are not always at the disposal for low-cost, full on-chip solutions. On the other hand, the techniques for the process and temperature compensation of the references, such as [6], have a limited success, and are always inferior to the performance of the crystal oscillators. Therefore, it is obvious that the trimming procedure should be considered for the increased accuracy. In this paper, the digital trimming technique is described, similar to the technique presented in [4]. Combined with the LUT (Look-Up Table) and a temperature sensor, it enables the calibration of the oscillator in the entire temperature range. This principle is illustrated using the conventional relaxation oscillator topology, shown in Fig. 1, which has the advantage of having the calibration possibility with the referent currents, in contrast to ring and harmonic oscillators where the trimming of the resistors and capacitors would be needed. Finally, the modification of the topology which ensures more accurate results of the calibration procedure is proposed, with a negligible increase of the circuit complexity and power consumption. This paper is organized as follows. In Section II the conventional topology is described and analyzed. The MIPRO 2016/MEET Figure 1. The conventional relaxation oscillator topology. Section III proposes the improvement of the design, while in Section IV the improved topology is verified with the simulations. The final conclusions are given in Section V. II. CONVENTIONAL OSCILLATOR ARCHITECTURE A. Timing Analysis The conventional topology of the relaxation oscillator is shown in Fig. 1. As seen in Fig. 2, the capacitor C charges and discharges with the current IREF. The switches S1 and S2 alternate the charging and discharging process every half-cycle. The two comparators, biased with the current IB, activate the set and reset signals of the SR flipflop at the moments when the capacitor voltage VC gets higher and lower than the referent voltages VREFhigh and VREFlow, respectively. From Fig. 2 it can be seen that the duration of one period is determined by the slew rate of the capacitor voltage (SRVC = C/IREF), the difference of the Figure 2. One period of the capacitor voltage VC. 63 referent voltages (ΔVREF = VREFhigh – VREFlow), and the time needed for the comparators to activate the set and reset signals (td1, td2). If we neglect the switching delays of the flip-flop and the switches, and assume that td1 = td2 = td, the expression for the period is then calculated as follows:    C VREFhigh  VREFlow  2CVREF T  2  td 1  td 2    4td  I I REF REF   The comparator delay td observed in (1) represents a serious problem in low-power and high-precision designs [1]. First of all, it takes a significant portion of the total period duration, unless an excessive amount of power is consumed. Furthermore, it is never precisely known, being influenced by parasitics, as well as the process, temperature, supply and referent voltage changes. It also has a negative influence on the calibration procedure, explained in the following section. B. Calibration Method From (1) we can observe that the oscillation period can be modified with the current IREF. A straightforward way to realize this is a current DAC (Digital to Analog Converter) controlled by the temperature sensor. The block scheme of such a system is shown in Fig. 3. As a result, the current IREF is defined as  I REF  B  I REF0  B  T0  4td  Tdes  4td  where T0 represents the measured period with B = 1. From (4) we can observe that for the precise calibration, the parameter td has to be known upfront. As this is never the case, the method for bypassing this effect is developed in the following section. III. MODIFIED OSCILLATOR ARCHITECTURE A. Comparator Delay Analysis At the beginning, the analysis of the comparator delay is conducted. For this purpose, two complementary symmetrical OTA (Operational Transconductance Amplifier) topologies shown in Fig. 4 are employed. Using the symmetrical OTA topology has the advantage of reduced systematic offset and better transfer characteristic in contrast to the basic, asymmetrical OTA topology. The complementary topologies are suitable because of the different voltage levels at the inputs of each comparator. In Fig. 5 the time diagram of the low-to-high transition of the clock voltage VCLK is shown. Currents and voltages correspond to the OTA in Fig. 4(a), i.e. to COMP1 in Fig. 1. For t < 0, the voltage at the inverting input of the comparator (VREFhigh) is larger than the voltage at the noninverting input (VC). From this it follows that the current ID1 is larger than ID2, the transistor M10 is in the linear re-  where IREF0 is the value of the unaltered referent current, while B is the referent current multiplication factor. Equation (1) then becomes:  T 2CVREF  4td  B  I REF0  From (3) we can derive the expression for the factor B needed to trim the oscillator to the desired period (Tdes) at a certain temperature: (a) (b) Figure 3. The referent current and voltage generator, together with the calibration circuitry. 64 Figure 4. The symmetrycal OTA (a) with nMOS input pair (COMP1) (b) with pMOS input pair (COMP2) . MIPRO 2016/MEET Furthermore, we can safely neglect the higher order effects in (6) related to ΔV2, as the voltage ΔV in this application will be low. As a result, the expression in (6) is reduced to  I D1, 2 IB  1  2 KT 1  V  2 IB       For t > 0, the current IOUT, which charges and discharges parasitic capacitances at the output node of the comparator, can be written as  I OUT  I D8  I D10  I D 2  I D1   Equation (9) combined with (8) leads to Figure 5. The time diagram of the comparator signals during the lowto-high transition of the clock voltage VCLK. gion, while the transistor M8 is in the saturation region. As a consequence, the voltage VOUT is at the ground level. At the moment t = 0, the voltages at the inputs are equal, as well as the currents ID1 and ID2. At this moment, the transistor M10 starts to enter the saturation region, while the output current IOUT, which charges the parasitic capacitance Cp at the output, becomes equal to the difference of the currents ID8 and ID10. The voltage VOUT then rises and reaches the ΔVthr level of the flip-flop at t = td, which then changes the output state of the flip-flop from low to high. At this moment, the states of the switches S1 and S2 also reverse, and the voltage VC now starts to decrease with the identical slew rate. As a result, the waveforms of the currents ID1 and ID2 are mirrored around t = td. They become equal again at t = 2td, after which they discharge the voltage VOUT to the ground level. In order to describe the behavior quantitatively, first we use the equation for the strong inversion of the MOS transistor, with neglected channel length modulation effect: I D  KT (VGS  VT ) 2    where ID is the drain current, KT is the technology and design constant, VGS is the gate-source voltage, and VT is the threshold voltage. Using (5), we can derive the following equation for the differential currents of the transistors M1 and M2:  I D1, 2 IB  1  KT 1  V 2 IB  2 KT V 2   IB   V  VIN   VIN   VC  VREFhigh  MIPRO 2016/MEET I OUT  V 2KT I B   For 0 < t < td, the differential voltage ΔV is equal to V (t )   I REF  t  C  Combining (10) with (11), we obtain the current IOUT as a function of time for 0 < t < td :  I OUT (t )  I REF  2 KT I B  t  C  The delay td of the comparator is experienced as the time needed for the output voltage to charge up to the threshold voltage ΔVthr of the SR flip-flop (approximately half of the supply voltage). As the following is valid:  Vthr  1 td  I OUT (t )dt  Cp 0  the expression (13) combined with (12) then gives  Vthr  1 td I REF  2 KT I B  tdt   Cp 0 C  After the integration, we obtain the following expression for the delay td of the comparator td  2CVthrC p    A similar analysis can be performed for the high-tolow transition. Although in this case the parameters Cp, ΔVthr and KT slightly differ, this difference can be neglected for the purpose of this simplified analysis. where   I REF 2 KT I B   65 B. Calibration Method with Modified Architecture After combining (1) and (15), the expression for the duration of the clock period becomes  T 2CVthrC p 2CVREF 4  B  I REF0 B  I REF0 2 KT I B  If we set the bias current IB in the following relation with IREF0  I B    B 2 I REF0   circuit is shown in Fig. 6. First, we start from the current equation valid for MOS transistors in weak inversion. The currents IB1 and IB2 are equal to             Each parameter in (18), excluding the factor B, is a constant determined by the design and process at a certain temperature. If we compare (18) with (3), we can see that the delay of the comparators can be modified by varying the factor B in the same way as the rest of the expression. As a result, the expression (4) reduces to  T B  0  Tdes  I R   I R  I B 2  I B1 exp B1 B   I B1 1  B1 B    nkT / q   nkT / q   I B1  I REF  B  I REF0   I B  I B 2  I REF   together with  the expression for the bias current IB turns into 2 I B  B 2 I REF 0   C. Square Function Realization The improved calibration method, according to (17), requires the square relationship between the bias current IB and the parameter B. Exponential behavior of the MOS transistors in weak inversion will be exploited in order to realize this relationship. The schematic of the proposed  for IB1RB/(nkT/q) < 1. If we consider that  from which it follows that an improved precision will be achievable, since the only parameter required for the calibration is the measured period T0.  where I0 is the constant determined by the design and process, n is the emission coefficient, k is the Boltzmann constant, T is the temperature in Kelvin and q is the charge of an electron. From (20), we can calculate the relationship of the two currents: where κ is an arbitrary constant, the expression for the clock period then becomes equal to  2 I REF0  4 CVthrC p    KT 1 2CVREF  T   B  I REF0 I REF0     V  I B1, 2  I 0 exp GS 1, 2    nkT / q   RB    B 2 I REF0  nkT / q  where   I REF0 RB  nkT / q  Although in theory κ close to or larger than one would induce the higher order effects in the approximation made in (21), in practice it will push the transistor MB2 towards the strong inversion region, compensating for the effects and making the approximation valid again. Also note that the circuit in Fig. 6 can be scaled down to reduce the power consumption. IV. SIMULATIONS To confirm the observations made in the previous sections, the simulations in ams 0.18μ CMOS technology are performed. The basic topology from Fig. 1 is combined with the reference generators and the calibration circuitry shown in Fig. 3. The improved topology also utilizes the bias current generator from Fig. 6. Figure 6. The square function generator. 66 For the simulation purposes, the supply voltage VDD is set to 1.8 V, the capacitor C to around 1 pF, the current IREF0 and IB to 1.2 μA, while the referent voltages VREFhigh and VREFlow are equal to 1.2 V and 0.6 V, respectively. The factor κ is approximately set to 1. Considering everything MIPRO 2016/MEET Basic topology (1) Improved topology (2) Figure 7. Absolute frequency error after trimming vs. the temperature, plotted for 5 process corners of the basic (1) and the improved (2) topology. mentioned before, it is clear from (1) that the resulting frequency will be somewhere around 1 MHz, not counting the effects of the comparator delay. Shown in Fig. 7, the simulation results of the calibration procedure demonstrate the improvement. The results are plotted over the temperature range from -40 to 125 ○C, with 5 different process corners considered for both topologies. The basic topology exhibits the absolute frequency error from -0.9% to 0.4% in the complete temperature range. The error falls down to a range from -0.05% to 0.2% with the improved topology. That corresponds to the error reduction of more than five times. Moreover, the improved topology exhibits almost ideal behavior in the temperature range from -40 to 75 ○C, with the absolute frequency error around ±0.05%. The frequency error for the improved topology is shown separately in Fig. 8. V. CONCLUSION A new topology for the improved calibration technique of the relaxation oscillators is proposed. Based on the analytical calculations, the method for neutralizing the comparator delay is developed. The proposed technique is verified with the simulations in ams 0.18μ CMOS process. MIPRO 2016/MEET Figure 8. Absolute frequency error after trimming vs. the temperature, plotted for 5 process corners of the improved topology. The results reveal the possibility to achieve a low-power, full on-chip solution, having the accuracy comparable to the accuracy of crystal oscillators. REFERENCES [1] [2] [3] [4] [5] [6] Y. Tokunaga, S. Sakiyama, A. Matsumoto, S. Dosho, “An on-chip CMOS relaxation oscillator with voltage averaging feedback,” J. Solid-State Circuits, vol.45, no. 6, pp. 1150-1158, Jun. 2010. T. Tokairin, et al., “A 280 nW, 100 kHz, 1-cycle start-up time, onchip CMOS relaxation oscillator employing a feed-forward period control scheme,” in Proc. Dig. Symp. VLSI Circuits, pp. 16-17, Jun. 2012. K.-J. Hsiao, "A 32.4 ppm/°C 3.2-1.6V self-chopped relaxation oscillator with adaptive supply generation," in VLSI Circuits Symp. Dig. Tech. Papers, pp. 14-15, Jun. 2012. A. Vilas Boas, A. Olmos, “A temperature compensated digitally trimmable on-chip IC oscillator with low voltage inhibit capability,” in Proc. IEEE Int. Symp. Circuits and System (ISCAS), vol. 1, pp. 501-504, Sep. 2004. Y. H. Chiang, S. I. Liu, “A submicrowatt 1.1-MHz CMOS relaxation oscillator with temperature compensation,” IEEE Transactions on Circuits and Systems II: Express Briefs, vol. 60, no. 12, pp. 837-841, Oct. 2013. K. Sandaresan, P.E. Allen, and F. Ayazi, “Process and temperature compensation in a 7-MHz CMOS clock oscillator,” IEEE J. SolidState Circuits, vol. 41, no. 2, pp. 433-441, Feb. 2006. 67 A Bootstrap Circuit for DC–DC Converters with a Wide Input Voltage Range in HV-CMOS N. Mitrovic, R. Enne and H. Zimmermann Institute of Electrodynamics, Microwave and Circuit Engineering Vienna University of Technology Vienna, Austria e-mail: natasa.mitrovic@tuwien.ac.at, reinhard.enne@tuwien.ac.at and horst.zimmermann@tuwien.ac.at Abstract - Bootstrap circuits are essential parts of integrated DC–DC converters with an NMOS transistor as high-side switch, which provide voltage overdrive for the gate drivers and to drive the high-side switch’s gate. This paper presents a bootstrap circuit which doesn’t need an additional supply voltage for charging the bootstrap capacitor to a desired voltage level, but uses the input voltage of the converter. Other advantages of this circuit are proper operation over a wide range of duty ratios and input voltages (from 7.5 V to 45 V). The circuit is designed in 0.18 µm 50 V high-voltage (HV) CMOS technology as part of a buck DC–DC converter for a high output power. Post layout simulations show a voltage dip lower than 80 mV when the high-side NMOS transistor is switched on and the power loss of the bootstrap circuit equal to 160 mW for nominal conditions (input voltage 36 V, duty ratio 15%, bootstrap voltage 4.8 V, input power of drivers 115 mW). The layout dimensions of the bootstrap circuit are 96 µm × 251 µm. I. INTRODUCTION Switch mode step down DC–DC converters are widely used due to their high efficiency even at high input power, compared to low-drop-out (LDO) regulators. The use of an NMOS transistor as high-side (HS) switch is preferable rather than having a PMOS HS switch, since it can achieve low on-resistance with usually three times smaller transistor width compared to a PMOS. This means that considerably less area is needed for large-area high-voltage transistors, and, more important, better efficiency. The main disadvantage is that a bootstrap circuit together with the bootstrap capacitor has to be implemented. The bootstrap circuit has to provide enough energy to the bootstrap capacitor so it can properly supply the HS drivers to switch the HS NMOS correctly. The conventional bootstrap circuits consist of an external diode connected to an additional voltage supply and a capacitor, which is not area efficient and can be potentially a source of instability. Depending on the technology, a bootstrap circuit together with smaller bootstrap capacitor can be implemented on chip, but additional voltages must be provided [1]-[5]. This paper presents a bootstrap circuit which is supplied from the input voltage of the converter, so the need for additional high voltage sources is eliminated. The voltage The authors would like to thank the Austrian BMVIT via FFG, the ENIAC Joint Undertaking and AMS AG for financial funding in the project eRAMP. 68 Figure 1. Structure of the buck converter on the bootstrap capacitor is not determined by the input voltage, which represents one big advantage, so this topology of the circuit is easily adjustable to fit the requirements of wide range DC–DC converters specifications. I. CONVERTER OVERVIEW Fig. 1 illustrates the simplified structure of the investigated buck converter. The chip design is divided into high-side (HS) switch transistor MH and low-side (LS) switch transistor ML with drivers DRH and DRL, respectively, and regulation loop. The switching node (SN), whose potential is denoted as VSSH, is connected to the offchip smoothing inductor (LSM) and capacitor (CSM). The drivers for HS and LS switching transistors, are cascade connected tapered inverters. During switching on the power transistors, the drivers have to inject a significant amount of charge into the gates of the power transistors in order to enable their fast turn on and to minimize the switching losses. The regulation loop is based on a currentprogrammed controller, i.e. peak current-mode control is used, which determines and controls the duty cycle of the converter. All blocks in the dotted box are realized in an N-well isolated 0.18-μm high-voltage CMOS technology. The circuit is designed for 36 V input voltage and 5.5 V output voltage with 1 A output current. An external inductor and capacitor are used as output filter. MIPRO 2016/MEET A. Power MOSFETs sizing Main losses of the converter are usually caused by drivers of the switch transistors and their on-resistance [6]. The power loss in the gate drivers is given by (1), where eG,MH, eG,ML are the energy per transistor width needed to switch on the switch transistors, wMH, wML are the widths of the power MOSFETs and f is the switching frequency. Equation (2) represents the conduction losses of MH and ML, assuming continuous conduction mode (CCM), where ron,MH and ron,ML represent the on-resistances related to the widths, I is the mean output current and δ is the duty ratio. 𝑃𝑑𝑟𝑖𝑣𝑒𝑟 = 𝑓 ∙ (𝑒𝐺,𝑀𝐻 ∙ 𝑤𝑀𝐻 + 𝑒𝐺,𝑀𝐿 ∙ 𝑤𝑀𝐿 ) (1) 𝑟𝑜𝑛,𝑀𝐻 𝑟𝑜𝑛,𝑀𝐿 (1 − 𝛿)] 𝑃𝑟𝑒𝑠 = 𝐼 2 ∙ [ ∙𝛿+ 𝑤𝑀𝐻 𝑤𝑀𝐿 (2) By minimizing the sum of these two equations, relations for the transistor widths is given in (3) and (4): 𝑤𝑀𝐻 = 𝐼 ∙ √ 𝛿 ∙ 𝑟𝑜𝑛,𝑀𝐻 𝑓 ∙ 𝑒𝐺,𝑀𝐻 (3) Figure 2. Circuit diagram of proposed bootstrap circuit II. (1 − 𝛿) ∙ 𝑟𝑜𝑛,𝑀𝐿 𝑤𝑀𝐿 = 𝐼 ∙ √ 𝑓 ∙ 𝑒𝐺,𝑀𝐿 (4) After replacing the values obtained from simulations and values extracted from the specifications of the converter, the width of the MH transistor is calculated to be wMH = 7 cm, and the width of ML is wML = 14 cm. Drivers are designed in such a manner that the last inverter in the chain can drive the gate capacitance of the switching transistor, therefore the tapering factor and the number of stages of the inverters are determined relative to the width of the power transistors. B. Sizing of the bootstrap capacitor From simulations the electrical charge of QG = 5 nC needed for HS MOSFET to switch on is calculated by integrating its gate current during its transition time between off and on state. This charge has to be provided by the bootstrap capacitor, by releasing some charge into the DRH driver which redirects it to the MH gate. The capacitance value of the bootstrap capacitor CBS is determined by setting the voltage drop of VBC being tolerable for this charge loss. In this case, the initial voltage VBC,1 is 5 V, the voltage drop was set to be 100 mV, and the capacitance CBS was calculated using the formula for the stored charge of a capacitor, Q = C·V. The initial charge stored in the bootstrap capacitor is Qstored,1 = CBS·V BC,1 and after discharge we have Qstored,2 = CBS·V BC,2, VBC,2 = 4.9 V, where QG = Qstored,1 – Qstored,2. The bootstrap capacitance is then calculated: 𝐶𝐵𝑆 = 𝑄𝐺 = 50𝑛𝐹. 𝑉𝐵𝐶,1 − 𝑉𝐵𝐶,2 (5) Since an increase in power consumption of the produced chip compared to the simulated design is expected, the finally chosen bootstrap capacitor is double the calculated value, i.e. CBS = 100 nF. MIPRO 2016/MEET CIRCUIT IMPLEMENTATION Fig. 2 shows the bootstrap circuit. The core of this circuit topology consists of the high voltage NMOS transistor M4, diodes D1 to D9 and the current mirror M1– M2. The drain current of transistor M4 directly charges the bootstrap capacitor. The stack of diodes, D1 to D9, indirectly defines the maximum achievable bootstrap capacitor voltage, which can be represented as 9·VD – Vth,M4 where Vth,M4 is the threshold voltage of M4. The high– voltage NMOS transistor M3 works as a current source, providing the bias current for the current mirror M1–M2. The current of the transistor M3 is determined by the size of the transistor and VBIAS voltage, which can be any lowvoltage form DC source that is already used by other regulation circuits or the band gap voltage which is usually used as the reference voltage in the regulation loop of the converter. When the MH is turned on, switching node SN is on high potential, close to input voltage of converter, VIN. Transistor M4 is off and charging of the bootstrap capacitor is disabled. Only the transistors M1 and M3 conduct very small referent current, which can’t be mirrored due to the high potential of node VG4. Capacitor C1 keeps the gate potential of the M4 during it’s off period, i.e. it holds the VGS4 voltage positive and prevents damages to the transistor. Also, diode D10 has significant role during this interval, by blocking reverse current of M4, i.e. preventing M4 to discharge the bootstrap capacitor. Similar function has the diode D11 which prevents C1 discharging through transistor M2. In the opposite interval, when the ML power switch is conducting, charging of the bootstrap capacitor takes place. Transistor M2 is now conducting, mirroring the current of M1. One part of the M2 current serves to charge capacitor C1. The diodes D1–D9 carry current, and they conduct the other part of the current of M2. The gate-source voltage of M4 is now positive and above the threshold voltage, so M4 conducts significant current which charges the bootstrap capacitor. 69 Figure 5. Power loss and bootstrap voltage dependence on the input voltage (duty ratio 0.15) C1 and diodes D1-D9, Fig 3b. At first, most of the current is flowing into the capacitor C1, so the voltage VG4 is rising, causing then raise in diodes current, which in the second part of the interval becomes dominant. The voltage dip of the voltage VBC, defined like the difference between voltages when MH is off and on, is very low, and it is in the range of 80–100 mV, which can be seen in Fig 3c. The layout of the bootstrap circuit is given in Fig. 4. The capacitor C1 which had to be divided into two capacitors connected in series, because of the technology rule about maximum allowed voltage between terminals, occupies most of the area of the circuit. III. Figure 3. a)-c) Transient waveforms of the voltage and currents of the bootstrap circuit: a) IM4 – drain current of the M4 transistor, VSSH and VDDH – potentials at the bootstrap capacitor terminals; b) IM2 – source current of the M2 transistor, IC1 – current of the capacitor C1, ID1 – current of the diode D1; c) VBC – votage across the bootstrap capacitor. Transient simulation traces for nominal conditions of the most important signals are shown in Fig. 3. During the time when MH is conducting, voltages VSSH and VDDH are at high potential, so the bootstrap circuit is inactive, M4 is not conducting, which is clearly visible on the waveform, Fig. 3a. When MH is switched off, the drain current of M4 is increasing due to raise of its gate voltage, and the bootstrap capacitor is being charged. It also shows how the M2 current is distributed over the time between capacitor SIMULATION RESULTS Post layout transient simulations of the circuit were run with varying two parameters: input voltage of the converter and duty ratio. In both cases two outcomes are interesting to monitor: the final voltage across the bootstrap capacitor, VBC, and the power loss of the bootstrap circuit. All the simulations are performed at switching frequency of 1MHz. The goal for the final voltage across the bootstrap capacitor is set to be 4.5 V for two reasons. Firstly, this way the proper operation of the drivers is guaranteed, so the reliability that they can always switch is high. The other reason is reduction of the conduction losses of MH transistor, since when it is conducting, its gate-source voltage is equal to the VBC. The goal could have been put even lower, since the drivers can operate properly even with the VBC voltage as low as 2.5 V and the threshold voltage of the MH is 0.7 V, at the cost of efficiency and reliability. In Fig. 5 it can be observed that power losses are strongly correlated with the input voltage of the converter, with almost linear dependency. The main percentage of the circuit losses is due to the fact that M4 and D10 conduct during the period when the switching node is at low potential, so current of some milliamps at such large voltage difference will make significant loss due to the 36 V input voltage, which is the target input voltage of this converter. Even so, the loss of the circuit which is equal to 160 mW at this operating condition is acceptable in comparison to the 5.5 W converter output power, i.e. the loss of the bootstrap circuit will reduce the efficiency of the whole converter by less than 3%. Figure 4. Layout view of the bootstap circuit 70 MIPRO 2016/MEET directly related to the input voltage, so the circuit can be used for a wide range of converters. Also, the calculation of the widths of the HS and LS power switch transistors is shown, based on which then the value of the bootstrap capacitor capacitance is calculated. The voltage across the bootstrap capacitance being larger than 4.5 V in the realized circuit is obtained for input voltages already above 7V and duty ratios smaller than 55%. Simulations show also good stability of the bootstrap voltage when the input voltage and the duty ratio are varying. ACKNOWLEDGMENT Figure 6. Power loss and bootstrap voltage dependence on the duty ratio (input voltage 36 V) The maximum achievable bootstrap voltage is mostly limited by the input voltage, since it can be roughly defined as VBC = Vin-VD-VOV, where VD is diode forward voltage and VOV is the overdrive voltage of M4 transistor. So, as it can be noticed form the graph, there is a minimum input voltage of 7 V required for the bootstrap circuit to operate properly. On the other hand, when the input voltage is bigger than 7 V, the dependence of the bootstrap voltage on the input voltage is negligible, since it is then defined by the forward voltage of the diodes and the gate-source voltage of M4. The bootstrap voltage VBC is very dependent on variation of the duty ratio, Fig. 6, since it represents the time interval in which the capacitor is not charged but in which it is being discharged. The power loss of the circuit is not so highly related, but it shows a slight increment with the increase of the duty ratio, which is a consequence of a lower value of VBC and then voltage VDDH, so the voltage drop across M4 and D10 is bigger. IV. CONCLUSION This paper presents a bootstrap circuit for DC–DC converters which is supplied from the input voltage of the converter. The voltage value across the capacitor is not MIPRO 2016/MEET The authors would like to thank A. Steinmair and F. Schrank from AMS AG in Unterpremstätten, Austria, for technical support. The work has been performed in the project eRamp (Grant Agreement N°621270), co-funded by grants from Austria and the ENIAC Joint Undertaking. REFERENCES [1] [2] [3] [4] [5] [6] Seidel, A.; Costa, M.; Joos, J.; Wicht, B., “Bootstrap circuit with high-voltage charge storing for area efficient gate drivers in power management systems, ” in Proc. 40th European Solid State Circuits Conf., ESSCIRC 2014, Sep. 2014., pp.159-162. Seidel, A.; Costa, M.S.; Joos, J.; Wicht, B., “Area Efficient Integrated Gate Drivers Based on High-Voltage Charge Storing, ” in Solid-State Circuits, IEEE Journal of, vol.50, no.7, pp.15501559, July 2015. Xu, J.; Lin Sheng; Xianhui Dong, “A novel high speed and high current FET driver with floating ground and integrated charge pump, ” in Proc. 2012 IEEE Energy Conversion Congress and Exposition (ECCE), Sep. 2012., pp.2604-2609. K. Abe, K. Nishijima, K. Harada, T. Nakano, T. Nabeshima, and T. Sato, “A novel three-phase buck converter with bootstrap driver circuit,” in Proc. Power Electronics Specialists Conf., Jun. 2007, pp. 1864–1871. M. Huque, R. Vijayaraghavan, M. Zhang, B. Blalock, L. Tolbert, and S. Islam, “An SOI-based high-voltage, high-temperature gatedriver for SiC FET,” in Proc. IEEE Power Electronics Specialists Conf., PESC 2007, Jun. 2007, pp. 1491–1495. R. Enne and H. Zimmermann, “An integrated low-power buck converter with a comparator controlled low-side switch,” in Proc. IEEE 13th Int. Symp. Design Diagnostics Electron. Circuits Syst., 2010, pp. 84–87. 71 A Fractional-N Subsampling PLL based on a Digital-to-Time Converter N. Markulic1, 2, K. Raczkowski1, P. Wambacq1, 2 and J. Craninckx1 1 imec, Leuven, Belgium, Vrije Universiteit Brussel, Brussels, Belgium 2 Abstract - The paper presents a subsampling PLL which uses a 10-bit, 0.5 ps unit step Digital-to-Time Converter (DTC) in the phase-error comparison path for the fractional-N lock. The gain and nonlinearity of the DTC can be digitally calibrated in the background while the PLL operates normally. During fractional multiplication of a 40 MHz reference to frequencies around 10 GHz, the measured jitter is in the range from 176 to 198 fs. The worst measured fractional spur is -57 dBc and the in-band phase noise performance of the PLL is −108 dBc/Hz. The presented analog PLL in advanced 28 nm CMOS achieves a figure-ofmerit (FOM) of -246.6 dB that compares well to the recent state-of-the-art. I. INTRODUCTION Frequency synthesizers, typically implemented as phase-locked-loops (PLLs) are omnipresent building blocks used for local oscillator (LO) generation in radio frequency (RF) communication, accurate clock generation in digital circuits, clock recovery, etc. In wireless transceivers, they serve for up/down conversion of the baseband data. A very peculiar aspect of a wireless LO synthesizer is its phase noise and spurious performance. Namely, system level performance in, both, the receive (RX) and transmit (TX) chain are fundamentally limited by the LO phase noise. For example, the limit in error vector magnitude (EVM) in an RX and a TX for high-order modulation schemes is limited by the LO in-band phase noise. In an RX, adjacent channel interferers are reciprocally mixed down onto the desired signal by the LO phase noise and spurs. Moreover, the TX spectral output mask in the receive band is limited by TX LO far-out phase noise, etc. A considerable amount of energy and chip area is therefore typically spent to guarantee low phase-noise LO operation. The analog subsampling PLL [1] introduced in 2009, shows, even today, an unparalleled synthesizer efficiency amongst the CMOS frequency synthesis state-of-the-art: the lowest integrated phase-noise (i.e. RMS jitter) vs. power consumption. The extreme phase-error detection gain of this architecture reduces the in-band phase-noise and, thanks to that, allows wide bandwidths for efficient VCO noise filtering. In this way, a subsampling architecture achieves the PLL “utopia” in which the output phase-noise is mainly dominated by the reference noise [1]. At the same time, power consumption is reduced thanks to the divider-less operation. However, the architecture’s inherent integer-N operation prevents the adoption of this approach in practical wireless transceivers. Within the fractional-N PLLs, the all-digital systems (for example [2], [3]) show potential of achieving 72 performance similar to the subsampling PLL, although often at the cost of large power consumption of the time-todigital converter (TDC). Moreover, to achieve highest performance, these systems depend on increasingly complex and power consuming calibration techniques. We propose a solution that enables fractional-N operation of a subsampling PLL [4,7]. Instead of measuring the fractional residue phase-error with a TDC, we recognize that this error is known a priori and can be compensated for by a Digital-to-Time converter (DTC). In this manner, we are able to achieve fractional-N lock, while retaining the key benefits of subsampling operation. The mixed-signal solution that we propose takes advantage of nanoscale CMOS and is not limited by its analog performance. Phaseerror detection is done in essence by a switch and a capacitor that benefit from scaling and the charge pump/transconductor can be very simple since, in a subsampling PLL, their noise and linearity do not impact the overall system performance [5]. II. FRACTIONAL-N OPERATION OF THE SUBSAMPLING PLL ENABLED BY A DTC The integer-N subsampling PLL [1] operates by sampling the (differential) voltage-controlled oscillator (VCO) sinusoid with a repetition rate set by the reference frequency (Figure 1). In a phase-locked state the sampling happens precisely at the zero crossings of the differential sinewave. Any deviation from the timing of the zero crossing results in a non-zero voltage being sampled, which in turn is converted to a correction current fed to the loop filter. Because the sampling events occur only at the edges of the reference, the VCO zero crossings are aligned to these edges and the VCO produces a frequency which is an exact integer-N multiple of the reference. The basic subsampling PLL cannot synthesize fractional-N frequencies, because it lacks any phase modulation mechanism in the loop (for example a divider). We therefore add a DTC in the reference path of the PLL as depicted in Figure 1. The DTC is used to control the exact moment of sampling, such that it always falls on the expected zero crossing of the VCO, even for non-integer ratios between the VCO frequency and the reference clock. A simple example with a fractional-N multiplication that differs from integer-N by 0.25 is shown in Figure 1. In the first cycle the sampling edge appears at the same moment as in the integer-N mode. Then, in the second cycle, the sampling edge is delayed by 0.25*Tvco after the reference edge. In the third cycle, the sampling edge is delayed by 0.5*Tvco, then by 0.75*Tvco. Finally, in the MIPRO 2016/MEET fifth cycle, the sampling should happen 1*Tvco after the reference edge, however, simply skipping a VCO cycle and sampling at the original reference edge yields to the same effect. Since the required PLL frequency and the reference frequency are known, it is possible to calculate the position of any following zero crossings with absolute precision. The DTC should cover at least 1 VCO period of delay. The digital computation of the necessary phase adjustment, i.e. of the delay that needs to be inserted in the reference path is depicted in Figure 2. The difference between the multiplication factor N and the integerquantized value is extracted first. A ∆Ʃ modulator is used to generate the integer quantization of N so that the signal Diff is a zero-mean stream accumulated with the desired “phase wrapping” behavior. The accumulated value Acc is then appropriately scaled on the available DTC input quantization range. III. IMPLEMENTATION CONSIDERATIONS If a fractional-N subsampling PLL, as described in the previous section, were implemented with an ideal DTC, it would have the same performance as an integer-N subsampling PLL. This lies in stark contrast to the case of a traditional mixed signal ∆Ʃ PLLs, where there is an unavoidable penalty associated with the modulation through the divider. Any practical implementation of the fractional-N subsampling PLL system will, however, be limited by the limitations of the DTC. Figure 1: Fractional-N subsampling PLL operation. Figure 2: DTC input calculation path. MIPRO 2016/MEET A. DTC quantization error A DTC has finite resolution. To scale the output of the accumulated phase error to a digital tuning code, the output of the accumulator Acc in the Figure 2 needs to be 𝑇 multiplied by a factor (𝐿𝑆𝐵 𝑅𝐸𝐹 . The sampling ) 𝐷𝑇𝐶 ∙𝑁𝐹𝑅𝐴𝐶 moments hence occurs with accuracy limited by the LSB of the DTC and the resulting error is fed into the low pas filter (LPF), thereby instantaneously modulating the VCO, creating spurs. System level simulations show that choosing a DTC LSB of 0.5 ps ensures that the quantization noise appears below other loop noise. Moreover, a second ∆Ʃ modulator (Figure 2) is used in front of the DTC to shape the associated quantization noise beyond the PLL bandwidth. Thanks to the fact that the stream is perfectly accurate on average, the average PLL frequency is also accurate, with no visible modulation. Another modification to the basic system that helps to mitigate the problem of limited DTC resolution, is to use a MASH modulator in the beginning of the delay computation path (initial ∆Ʃ modulator in Figure 2). A MASH modulator provides better randomization of the generated code, which helps with reducing spurious content. Compared to a first-order ∆Ʃ, the generated codes have a larger range, which results in a larger delay range of the DTC. For this reason we choose a 10-bit DTC implementation. By generating delays larger than one VCO period, it is possible to effectively de-color the sampling data. Moreover, randomizing DTC codes provides an effect similar to dynamic element matching. Since e.g. four DTC codes are used in MASH 1-1-1 mode to generate the same effective sampling phase, the apparent DTC nonlinearity is randomized. B. DTC gain error DTC gain can be defined as the amount of delay per least-significant bit (LSB) code. The DTC is analog in its nature and susceptible to PVT variations, hence its absolute gain is unknown and varying with time and temperature. Gain error in the delay steps introduces problematic spurs in the spectrum of the PLL. Automatic background calibration which tracks the gain variations becomes therefore absolutely necessary. An automatic DTC gain calibration can be designed similarly to the popular least-mean-square (LMS) based mechanisms used in digital PLLs [6] (Figure 3). Simply stated, it is possible to extract the sign of the sampled voltage and correlate it with a change in direction of the DTC word. Intuitively, if the modulator “tells” the DTC to sample later, but due to a gain error “early” samples get consecutively detected — it is possible to deduct that the DTC gain is too low. After accumulation, the correction word can be applied as a scaling factor to the computation path of Figure 2. When the correction loop converges, there is no penalty on phase noise. Figure 4 shows a simulation result where a 10% gain error was applied to the DTC. This error introduces a large ripple in the sampled voltage, which in turn results in large spurs at the output of the PLL. After the DTC gain is corrected, the sampled voltage converges back to zero. 73 Figure 3: DTC gain digital background calibration. Figure 4: DTC gain background calibration simulation with 10% gain error on the DTC. Figure 5: DTC INL calibration loop [7]. C. DTC nonlinearity The DTC nonlinearity, will naturally increase spurious content at the output of the PLL. Many techniques for improving linearity which are present for digital-to-analog converters (DACs) also apply for the DTC. For example, careful layout of the tuning element is of highest priority. Advanced nanometer scale technologies offer a significant advantage in this regard, thanks to ever-improving lithography resolution. Nevertheless, to ensure linearity of phase-error detection path, we employ DTC nonlinearity calibration loop. The calibration is based on Error Sign signal observation, which represents the sign of the instantaneous phase error. This 1-bit phase error is random and zero on 74 average in a linear system, but in presence of DTC INL it becomes “colored” by its nonlinearity. Essentially, we exploit the correlation between the Error Sign signal and the particular DTC input code which induced it, to restore the INL curves and pre-compensate for them. A look-uptable (LuT) with a set of coefficients c(0:k-1) is used to approximate the 10-bit DTC INL curve (where k is 32). In every clock cycle, the input code addresses two neighboring LuT coefficients which piece-wise linearly approximate the expected, instantaneous INL error. The INL compensation value is simply subtracted from the original code, which ideally forces the DTC to produce no error (zero mean Error Sign) for the given code. The LuT correction coefficients are updated gradually, by integrating the scaled Error Sign value to the appropriate address (defined with the input code) in every cycle. When the calibration is initialized, the LuT is reset to zero. While the PLL operates, the coefficients c(0:k-1) slowly change towards their optimal calibrated values that cancel the INL error. At this moment, the coefficient updating can be disabled. The algorithm convergence speed is determined by the tap gain G, where a typical value of 2 -13 results in 10 ms approximate calibration time. The DTC INL calibration loop is enabled after the gain error is corrected. An important detail is that the offset in the extracted error sign needs to be digitally compensated for the loop to converge. D. DTC phase-noise In this paper we propose a solution to enhance an integer-N subsampling PLL by placing a phase modulator (DTC) in the path of the reference. Unfortunately, the phase noise contribution of the DTC adds directly to the phase-noise of the reference. Ultimately, the in-band phase noise of the subsampling PLL is limited by the phase noise of both the reference and the DTC, since both pass the system in the same way. Therefore, great care must be taken to minimize the DTC's contribution to phase noise, otherwise the unique phase noise advantages of the subsampling architecture will be lost. Here, scaling of CMOS technology is again on our side, since transistors are getting faster with every node, reducing jitter and phase noise. IV. CIRCUIT IMPLEMENTATION The subsampling phase locked loop can only detect phase error, which makes it susceptible to false locking at any N. Therefore, a frequency acquisition loop is required in addition to the subsampling loop [1] (see Figure 6). A conventional PLL easily fulfills this requirement. It can be disabled once frequency has been acquired in order to save power. Common to both frequency and phase acquisition loops are the low-pass filter (LPF) and the VCO. For the purpose of demonstrating the concept of the fractional-N subsampling PLL we have chosen the simplest LPF design—a passive third-order lead-lag filter. Tunable resistance in the LPF has been implemented to be able to change the bandwidth of the PLL. Such a simple filter can cause increase in reference spurs and is often avoided in classical charge-pump-based PLLs. Spurious content can MIPRO 2016/MEET increase because the varying level of tuning voltage can introduce mismatches between the currents of the charge pump. In a subsampling PLL, however, any offset in currents of is compensated by a slight modification of the locking point Figure 7. A locked condition always means zero output current of the transconductor(𝐺𝑀 ). If changes to the output level cause an input referred offset of the 𝐺𝑀 , the PLL will adapt its phase to compensate for this offset. A. Implementation of the Subsampling Loop The subsampling loop consists of a VCO buffer, a sampler and an 𝐺𝑀 . Additionally, the DTC provides the required phase modulation. Figure 8 shows the circuits along the subsampling path. A VCO buffer is required in order to reduce the kickback effect from the sampler to the VCO [5] and to interface the signal levels between the blocks. In this test chip, to accommodate for changing phase noise requirements of a software-defined radio, we have implemented a low-noise VCO that can be operated from a variable supply as high as 1.8 V. Therefore, the input buffer needs to convert the level between the high voltage VCO domain (max. 1.8 V) and the core domain (0.9 V). The buffer is implemented with a tunable capacitive attenuator and a source follower pair (Figure 8). The tunable attenuator is built with metal-oxide-metal (MOM) capacitors and provides additional tuning of loop gain. The buffer is also the largest contributor to power consumption in this loop, as it needs to process a GHz-range signal. The sampler is built around an NMOS switch and a small MOM capacitor. In total, taking into account the input capacitance of the 𝐺𝑀 , the sampling capacitance is 20 fF. Thermal kT/C noise can be neglected because it is already suppressed by the large detection gain. The implemented sampler uses an auxiliary sampler operating in inverted phase to the primary sampler in order to reduce load variability of the VCO. Figure 6: Complete system overview. Figure 8: Simplified block diagram of the subsampling loop. Since the implemented VCO can operate from the IO voltage (1.8V), the tuning voltage also has a range larger than the core voltage. Therefore, the output stage of the 𝐺𝑀 needs to provide translation from the low voltage domain of the sampler to the high voltage domain of the LPF and the VCO. Identically to [1], the phase-error detection gain is so large that duty-cycling is required in the output stage of the 𝐺𝑀 . Pulsing is done with a simple digital pulse generator that opens the output switches of the 𝐺𝑀 . An important part of the system is the background correction of DTC gain and nonlinearity. As said earlier, the error signal from within the PLL is present in the sign of the sampled voltage. However, this is true only if no mismatches are present in the system. If there are any mismatches in the phase detection circuitry the PLL will adjust the locking phase (and sampled voltage) so that the output current of the 𝐺𝑀 is zeroed (Figure 7). Therefore, the gain correction mechanism requires detection of the sign of output current. Using a simple clocked comparator to detect the sign of the swing in relation to Vtune voltage is sufficient to obtain information about the sign of the output current. B. Implementation of the Digital-to-Time Converter [8] Since the DTC is at the input of the system, its phase noise is multiplied by a square of the PLL multiplication number when transferred to the output (here: 48 dB as N = 250). On top of that, any kind of non-linearity present in the phase error comparison path leads to potential noise folding or spurs [9]. From the PLL system perspective, we target a 10-bit DTC, with a 0.5 ps unit step. This delay range covers multiple VCO periods allowing operation with the thirdorder MASH 1-1-1 modulator. Because the 0.5 ps step is very small and we know from system simulations that the PLL is sensitive to its disturbance, we can suspect that the DTC needs a good isolation from any noise coming from the supply. Implementation of the delay generator is shown in Figure 9 [8]. The first inverters in the chain serve as an input buffer towards the delay circuit loaded with a tunable MOM capacitance 𝐶𝐿 . Figure 7: The subsampling PLL always locks into a state that guarantees zero output current, even in presence of offset and mismatch. MIPRO 2016/MEET 75 capacitor, and not from the VDD. The dip in the regulated supply voltage is suppressed by the gain of the current source before reaching the top supply. The dynamic charge flow is in this way kept within the structure itself. Figure 9: Implementation of the DTC. To suppress mismatch-based errors for the chosen unit size, the capacitor array employs a 5 bit binary/thermometer segmentation. With the unit capacitor size of 3 fF, this ensures statistical DNL errors below 0.5 LSB. The 10-bit array is placed in a common centroid layout to avoid systematic nonlinearities. Only the high-tolow transition of the 𝑉𝑋 voltage is important, because the subsampling loop reacts only on closing of the sampling switch. One could realize discharging of the load capacitance using a simple NMOS transistor, however, this would lead to an excessive 1/𝑓 noise contribution, which would dominate the PLL phase noise. To reduce this effect we introduce a resistor above the NMOS. The exponential discharging is determined then by the corresponding RC time constant. The delay is, however, a linear function of capacitance. A resistor sets the discharging slope and hence contributes to the output phase noise, however, it generates no 1/𝑓 noise. Furthermore, any supply ripple coming from the preceding buffer only modulates the NMOS switch resistance which is an order of magnitude smaller than the discharging resistor and does not affect delay. The phase noise level introduced by the delay generator can be derived as  ℒ 𝑤ℎ𝑖𝑡𝑒 ~10 log ( 𝑓𝑜𝑢𝑡 𝑘𝑇 𝐶𝐿𝑜𝑎𝑑 𝑅2 2 𝑉𝐷𝐷 ) ~ 10log ( 𝑓𝑜𝑢𝑡 𝑘𝑇 𝜏𝑑𝑒𝑙𝑎𝑦 𝑅 2 𝑉𝐷𝐷 ),  where R is the resistor value and VDD is the supply voltage of the delay element and fout is the output frequency. Based on (1) and the targeted minimal delay step of 0.5 ps we size the R=180 Ohm and C = 3 fF to lower the noise of this stage to 160 dBc/Hz for maximal delay. The RC delay control block is followed by a CMOS inverter serving as a comparator to restore steep slopes. This circuit toggling moment is unfortunately dependent on the input slope shape which degrades the linearity of a high range DTC. Care must also be given to the fact that regeneration of the RC-delayed slope is most vulnerable to supply modulation. A tunable regulated supply shown in Figure 9 is used to protect the supply of the comparator and the following buffer. The regulated supply consists of a constant current source biasing a diode-connected transistor. A capacitor of 4 pF is used for additional decoupling of the regulated supply node. At the moment of toggling, charge is instantaneously pulled from the 76 C. Implementation of the VCO [10] The VCO is a thick-oxide NMOS cross-coupled core with current limiting realized using a tunable resistor. Simulations have shown that this architecture yields favorable 1/f noise compared to the traditional currentsource-based architecture. The VCO has been designed to meet the stringent GSM900 specification for out-of-band phase noise. The inductor coil is created using a stack of two top copper metals (each <1 μm thick) and an aluminum redistribution layer. Digital tuning is realized using a bank of NMOS-only switched capacitor cells. The simulated Q of the tank reaches 18. The VCO is designed to operate with a low drop-out linear regulator (not present on chip) with a supply between 0.9V and 1.5 V, depending on the required phase noise performance and available power. D. Frequency acquisition loop The frequency acquisition loop has been implemented with a chain of divide-by-2/3 circuits, a traditional 3-state phase frequency detector (PFD), enhanced with a large deadzone [1] following and a very simple charge pump. The first stage of the divider is made with current mode logic, since the VCO frequency can reach 12 GHz, but the following stages of the divider are standard CMOS gates. Once the frequency acquisition is complete, the loop automatically becomes inactive thanks to the increased dead-zone in the PFD and can be completely shut down, saving power. In general, the loop components for both the phase and the frequency acquisition loop can be made very simple and do not require neither good precision, nor good matching, nor low noise. V. EXPERIMENTAL RESULTS The prototype IC was fabricated in TSMC 28nm bulk digital CMOS technology, and its size is 0.77 mm2 (excluding IO ring). It operates on 0.9 V and 1.8 V supplies (IO interface and the Gm stage). Figure 10: Die microphotograph. The measured power consumption is 5.6mW in total, of which 1.8mW is for the loop components, 2.7mW for MIPRO 2016/MEET the VCO, and 1.1mW for digital circuitry that all runs on reference clock of 40 MHz. The VCO tuning range is 10.1 – 12.4 GHz. The measured in-band phase noise around a close-to-integer fractional 11.72 GHz carrier is -107.9 dBc/Hz (Figure 11). The measured RMS jitter is 198 fs after calibration is enabled, with an integration range from 10 Hz to 40 MHz and all spurs (worst fractional) included. The measured spurious performance is shown in Figure 12. The worst fractional spur before calibration appears at -41 dBc but drops with 15.6 dB after calibration to -56.6 dBc. The integer spur is at -69 dBc. The PLL achieves one of the best reported FOMs in the recent state-of-the-art: 246.6 dB. VI. Table 1: Performance summary and comparison to the state-of-art. CONCLUSION A subsampling PLL is an architecture that offers extremely low phase noise, however, in its original form it is limited only to integer-N frequency multiplication. We presented a fractional-N subsampling PLL which operates based on a low-noise, low-quantization error DTC. The DTC is enhanced by digital background calibration which suppresses gain and nonlinearity issues. In this way we enhance the original phase noise performance of an integer-N subsampling PLL for fractional-N synthesis, without drawbacks of additional noise folding and spurs. [1] X. Gao et.al., “A Low Noise Sub-Sampling PLL in Which Divider Noise is Eliminated and PD/CP Noise is Not Multiplied by N^2,” Solid-State Circuits, IEEE Journal of, vol. 44, no. 12, pp. 3253–3263, 2009. [2] C.-W. Yao et.al., “A low spur fractional-N digital PLL for 802.11 a/b/g/n/ac with 0.19 ps RMS jitter,” in VLSI Circuits (VLSIC), 2011 Symposium on, 2011, pp. 110–111. [3] C.-W. Yao et.al., “A 2.8-3.2-GHz Fractional-Digital PLL With ADC-Assisted TDC and Inductively Coupled Fine-Tuning DCO,” 2013. [4] K. Raczkowski, et.al., “A 9.2-12.7 GHz wideband fractional-N subsampling PLL in 28 nm CMOS with 280 fs RMS jitter,” Solid-State Circuits, IEEE Journal of, vol. 50, no. 5, pp. 1203– 1213, 2015. [5] X. Gao et.al., “Spur reduction techniques for phase-locked loops exploiting a sub-sampling phase detector,” Solid-State Circuits, IEEE Journal of, vol. 45, no. 9, pp. 1809–1821, 2010. [6] D. Tasca et.al., “A 2.9-4.0-GHz Fractional-N Digital PLL With Bang-Bang Phase Detector and 560,” Solid-State Circuits, IEEE Journal of, vol. 46, no. 12, pp. 2745–2758, 2011. [7] N. Markulic et.al., “9.7 A self-calibrated 10Mb/s phase modulator with -37.4dB EVM based on a 10.1-to-12.4GHz, 246.6dB-FOM, fractional-N subsampling PLL,” in 2016 IEEE International Solid-State Circuits Conference (ISSCC), 2016, pp. 176–177. [8] N. Markulic et.al., “A 10-bit, 550-fs step Digital-to-Time Converter in 28nm CMOS,” in European Solid State Circuits Conference (ESSCIRC), ESSCIRC 2014-40th, 2014, pp. 79–82. [9] S. Levantino, et.al., “An adaptive pre-distortion technique to mitigate the DTC nonlinearity in digital PLLs,” Solid-State Circuits, IEEE Journal of, vol. 49, no. 8, pp. 1762–1772, 2014. [10] B. Hershberg et.al., “A 9.1-12.7 GHz VCO in 28nm CMOS with a bottom-pinning bias technique for digital varactor stress reduction,” in European Solid State Circuits Conference (ESSCIRC), ESSCIRC 2014-40th, 2014, pp. 83–86. Figure 11: Measured phase noise at the PLL output. Figure 12: Measured fractional spur level at the PLL output. MIPRO 2016/MEET 77 Infrared Protection System for High-Voltage Testing of SiC and GaN FETs used in DC-DC Converters Filip Hormot, Josip Bačmaga, Adrijan Barić University of Zagreb, Faculty of Electrical Engineering and Computing, Unska 3, 10000 Zagreb, Croatia Tel: +385 (0)1 6129547, Fax: +385 (0)1 6129653, e-mail: filip.hormot@fer.hr Abstract—This technical paper presents the design and testing of the protection system for evaluation of high-voltage devices and circuits, e.g. SiC and GaN devices or high-voltage switching DC-DC converters. The high-voltage testing area is protected from invading by an array of infrared (IR) emitters and detectors. Whenever an object passes through the IR protected space, a high-voltage DC source is disconnected from its power supply and harmful effects of high DC voltage are avoided. The functionality of the developed system is tested and its characteristic response timings are measured. Index Terms—hazardous DC voltage, high-voltage switch-mode power converter I. I NTRODUCTION The silicon FETs are being used for many years as power devices in switching DC-DC converters. However, power converters operating at high power levels and voltages of several hundreds of volts are not achievable using the existing silicon power devices [1]. Instead, the silicon carbide (SiC) power devices are being used in high-voltage power converters due to their high voltage breakdown and thermal conductivity [2]. Furthermore, gallium-nitride (GaN) FETs due to their extremely low gate charge and output capacitance can operate at higher switching frequencies [3] compared to the silicon FETs. Increase of a switching frequency reduces the size of external passive components which results in a higher power density and efficiency of the DC-DC converter. High-voltage SiC and GaN FETs used in DC-DC converters are subjected to the voltage levels of several hundreds of volts during the evaluation of their operation [4], [5]. If a human body comes into contact with the device under test (DUT) during a high-voltage testing procedure, a harmful effects of high voltage can occur. The effect of a DC electric shock is determined by the amplitude of the current through the human body and the duration of the shock. The DC current amplitude of 300 mA is considered as a safety limit for a human body [6]. The DC voltage that will produce a current of a 300 mA through the human body depends on the human body resistance [7]. It can be seen in Fig. 1 that if 50% of human body serves as a current path, the DC voltage of approximately 400 V will cause a maximum allowable current of 300 mA through a human body. Therefore, the DC voltages typically used to test the operation of DC-DC converters based on SiC and GaN power devices are hazardous for a human. 78 Fig. 1. Human body resistance as a function of body voltage [7]. In this paper, a protection system for high-voltage testing of SiC and GaN power switches in DC-DC converter applications is presented. The system is based on infrared (IR) detection of invasion into the high-voltage testing area. If the testing area is invaded, the protection system disconnects the high DC voltage supply from the DUT and the harmful effects of a DC voltage to a human body are avoided. The overview of the developed infrared protection system is shown in Section II. Section III presents the evaluation of the protection system. Section IV concludes the paper. II. OVERVIEW OF THE P ROTECTION S YSTEM A. Infrared Protection of the Testing Area The top view of the high-voltage testing area is shown in Fig. 2. The testing area is enclosed by the 700-mm high fence attached to the 40-mm square tubes. The 820-mm wide entrance monitored by the IR protection system is left open in the fence for a user to arrange a measurement set-up. The mechanical sketch of the IR protected space is shown in Fig. 3. The 16 IR emitters [8] and 16 photodiodes [9] are uniformly distanced and placed in a “zigzag” order for effective monitoring of the entrance to the testing area. During the normal operation of the protection system, the photodiodes receive IR signal generated by the IR emitters. Once the IR protected space is invaded, the MIPRO 2016/MEET 820 820 850 high-voltage testing area IR protected space 40×40 Fig. 2. Top view of the testing area. All dimensions are in milimeters. W = 820 21.8 43.7 H = 700 14×43.7 21.8 Fig. 3. Mechanical sketch of the space monitored by the IR protection system. All dimensions are in milimeters. The distances between the diodes are rounded to the first decimal. transmission of the IR signal between the IR emitters and one or more photodiodes will be broken. This will cause a decrease of the current through the non-illuminated photodiodes large enough to produce a voltage difference that is detected by the input stage of the electronic control system. B. Principle of Operation of the Protection System The simplified schematic of the protection system shown in Fig. 4 can be divided into three function blocks: • input stage that detects an invasion into the testing area, • logic stage that sets the operation mode of the system and indicates the invasion of the testing area, • output stage that disconnects the high-voltage DC supply from the DUT. The input stage of the developed system consists of a voltage divider with variable resistors R1 to R16 and reverse-biased photodiodes DP H 1 to DP H 16 . The current through each of 16 reverse-biased photodiodes IP H i (i = 1, 2, ..., 16) is larger than zero when all the photodiodes are illuminated by the IR signal generated by the IR emitters. If any of the photodiodes looses the IR signal from the emitter, its current falls to zero. The voltage across each of 16 photodiodes is: VP H i = VDD − IP H i · Ri , i = 1, 2, ...16 (1) where VDD is a 5-V supply, VP H i is the voltage across the photodiode, IP H i is the current through the photodiode and Ri is the resistance of the corresponding MIPRO 2016/MEET variable resistor. When the photodiode is not illuminated by the IR signal, the voltage across it equals VDD . The voltage acrosss the photodiode decreases as the photodiode gets more illuminated. The voltages VP H i are compared to the reference voltage VREF using a 16 comparator circuits. The value of VREF can be adjusted to achieve required sensitivity of the IR detection. The voltages at the comparator outputs VCOM P i (i = 1, 2, ..., 16) are: ( VDD , VP H i ≤ VREF , VCOM P i = (2) 0 V, VP H i > VREF . Since invasion into the testing area has to be detected by any of 16 photodiodes, all 16 VCOM P i signals are combined into one control signal using two 8-input NAND-gates and a 2-input OR-gate. The control signal VCT RL is connected to the “set” input of the SR-latch as shown in Fig. 4. The relationship between the control signal VCT RL and the VP H i is determined as: ( 0 V, VP H ≤ VREF , DP H i illum., VCT RL = VDD , VP H > VREF , DP H i not illum. (3) When the transmission of the IR signal to the either one of 16 photodiodes in interrupted, the control voltage VCT RL rises to VDD . This causes the turn-on of the MOSFET Q1 and closing of the normally-open contact of the relay K1 shown in Fig. 4. The relay K1 Closing the contact of the K1 will cause the closing of the normally-open contacts of the relay K2 and disconnection of the high-voltage DC source VDC from its power supply. The invasion into the testing area is indicated by the LED error indicator while disconnection of the VDC from its power supply is indicated by the LED protection indicator shown in Fig. 4. The transistor Q1 is turned off again by pressing the push-button SRST and the VDC is connected to its power supply. The emergency stop switch in the output stage can be used independently of the part for the IR detection. Pressing the emergency stop switch will also cause the relay K2 to disconnect the high-voltage DC source from its power supply. The top view of the assembled electronic control circuit with indicated main blocks is shown in Fig. 5. III. E VALUATION OF I NFRARED P ROTECTION S YSTEM The functionality of the control circuitry is verified by applying a short circuit between the inverting comparator input and a ground terminal to imitate a fully illuminated photodiode as shown in Fig. 6. When any of the comparator inputs are left open to imitate an invasion into the testing area, the control system triggers the on-state of the protection circuit and the voltage source that supplies the DUT is switched off. In order to test the functionality of the complete protection system, the IR emitters and photodiodes are placed on the two 700-mm high tubes distanced by 800 mm as shown in Fig. 7. The system is calibrated by adjusting the variable resistors Ri (i = 1, 2, ..., 16) shown in Fig. 4 in 79 VDD VDD to LED error indicator Ri VCOM P emergency stop switch i VREF 2 VP H i & IP H i VRST illuminated by IR signal DP H 12 V to LED protection indicator VCT RL ≥1 voltage source VDC K1 S Q R SRST DUT (SiC/GaN) VG K2 Q1 i GND input stage L1 GND logic stage -12 V L2 L3 output stage Fig. 4. Simplified schematic of the protection system. The system is divided into three function blocks: input, logic and output stage. The developed protection system contains 16 identical input stages (i = 1, 2, ..., 16). The array of 16 photodiodes is illuminated by IR signal generated by 16 IR emitters. VREF adjustment IR emitter interface LED error relay indicators K1 multimeters IR emitters oscilloscope VDD supply photodiodes IR protected space Fig. 7. Set-up to test the functionality of the complete protection system. interface to the output stage photodiode interface with variable resistors Fig. 5. The top view of the assembled control circuit. VDD Ri VP H VCT RL VREF logic S R S GND VG Q osc. GND Fig. 6. Set-up to test the functionality of the control circuitry. The voltage waveforms are sensed using an oscilloscope (osc.). such a way that all the voltages across the photodiodes are the same during the normal operation of the system, i.e. when the testing area is not invaded. The reference voltage VREF is adjusted to a value of 3.73 V that is high enough to avoid the impact of the noise on the illumination of the photodiodes caused by other sources of light such as daylight or room lightning. The system is initially set in a normal mode of operation in which all the photodiodes are illuminated by the IR signal. The IR protected space is invaded and characteristic 80 transient response timings of the protection system are measured. A pre-trigger of 1-5 ms is set at the oscilloscope to record the complete transition of the system response when the space protected by the IR emitters and photodiodes is invaded. All waveforms are recorded using the Agilent MSO7034B oscilloscope and Agilent 10073D passive probe configuration as follows: 1:1 ratio, DC coupling, bandwidth limit: OFF, input impedance: 10 MΩ. The transient response waveforms of the control system when the IR protected space is invaded are shown in Fig. 8. The voltage across the photodiode VP H rises to the value of VDD after the invasion is detected. When VP H crosses the value of VREF , the voltages VCT RL at the comparator output and VG at the gate of the Q1 instantaneously rise to the value of VDD . The time-delay of the control circuitry, i.e. the time between the detection of the invasion into the testing area and start of the turn-on process of Q1 is denoted as ∆tCT RL1 in Fig. 8. The major part of ∆tCT RL1 is the rise time of the voltage across the photodiode VP H which depends on how fast an object passes through the IR protected space. If the object slowly invades into the testing area, it will take more time for the photodiode to become completely non-illuminated and for the VP H to reach the VREF . The time-delay of the digital MIPRO 2016/MEET 5 2.5 0 0 5 7.5 10 12.5 15 17.5 20 17.5 20 ∆tCT RL1 = 1.75 ms 2.5 7.5 5 10 12.5 time [ms] 15 17.5 20 VP H [V] 2.5 15 VG [V] 5 2.5 0 0 12.5 5 5 2.5 0 0 2.5 0 0 ILOAD [A] VP H [V] VCT RL [V] VG [V] 5 VREF = 3.73 V 2.5 invasion into the testing area 0 2.5 7.5 10 0 5 1 0.5 0 0 VREF = 3.73 V 10 20 30 invasion into the testing area 40 50 60 ∆tCT RL2 = 32 ms ∆tK 10 20 30 40 50 60 50 60 ∆tT OT = 40 ms 10 20 30 40 time [ms] Fig. 8. Transient response of the control circuitry when IR protected space is invaded. Top to bottom: voltage across the photodiode, voltage at the output of the comparator and voltage at the gate of the Q1 . Fig. 9. Transient response of the complete system when IR protected space is invaded. Top to bottom: voltage across the photodiode, voltage at the output of the comparator and current through the dummy load. logic parts and SR-latch are negligible. A test case in which a dummy load used to imitate the output stage is being disconnected from the voltage source is designed to evaluate the response of the IR protection system. An invasion into the testing area is simulated and the voltage across the photodiode VP H , voltage VG at the gate of Q1 and the current through the dummy load IL are recorded and shown in Fig. 9. The total time-delay of the protection system ∆tT OT is the sum of the time-delay of the control circuitry ∆tCT RL2 and the time ∆tK between the beginning of the turn-on process of Q1 and fall of the IL to 50% of its value. Since the simulated invasion into the testing area is much slower than for the test case shown in Fig. 8 in order to reproduce the worst-case-scenario, the time-delay of the control circuitry ∆tCT RL2 is much larger than ∆tCT RL1 . The value of ∆tK is independent of the speed of invasion into the testing area and equals 8 ms. The total time-delay ∆tT OT between the detection of the invasion into the testing area and break of the output high-voltage circuit is approximately 40 ms. The largest permissible duration of the electric shock with the most harmless effects on human body is 50 ms as specified in [7]. The power dissipation of all 16 IR emitters and 16 photodiodes when they are illuminated is 1 W while the dissipation of the control circuitry is approximately 35 mW. permissible duration of the electric shock that is specified by safety regulations. IV. C ONCLUSION A protection system is developed to ensure safety of a user during a high-voltage testing of power devices such as SiC and GaN FETs used in switching DC-DC converters. The system is based on infrared detection of the invasion into the testing area in order to turn-off the high-voltage source that supplies the device under test. The functionality of the system is evaluated for different test cases and characteristic response timings are measured. The time-delay between the detection of an invasion and activation of the protection circuit is lower than the highest MIPRO 2016/MEET ACKNOWLEDGMENT This work was supported in part by the Croatian Science Foundation (HRZZ) within the project Advanced design methodology for switching dc–dc converters. R EFERENCES [1] J. Biela, M. Schweizer, S. Waffler, and J. W. Kolar, “SiC versus Si—Evaluation of Potentials for Performance Improvement of Inverter and DC–DC Converter Systems by SiC Power Semiconductors,” IEEE Transactions on Industrial Electronics, vol. 58, no. 7, pp. 2872–2882, July 2011. [2] B. J. Baliga, Fundamentals of Power Semiconductor Devices, 1st ed. Springer, 2008. [3] Narendra Mehta, GaN FET module performance advantage over silicon. Texas Instruments, March 2015, Application Note. [4] O. Mostaghimi, N. Wright, and A. Horsfall, “Design and Performance Evaluation of SiC Based DC-DC Converters for PV Applications,” in IEEE Energy Conversion Congress and Exposition (ECCE), Sept 2012, pp. 3956–3963. [5] R. Mitova, R. Gosh, U. Mhaskar, D. Klikic, M. Wang, and A. Dantella, “Investigations of 600-V GaN HEMT and GaN Diode for Power Converter Applications,” in IEEE Transactions on Power Electronics, vol. 29, no. 5, October 2013, pp. 2441 – 2452. [6] L. Gordon, “The physiological effects of electric shock in the pulsed power laboratory,” in Pulsed Power Conference, 1991. Digest of Technical Papers, June 1991, pp. 377 – 380. [7] C. H. Lee and A. P. S. Meliopoulos, “Comparison of touch and step voltages between IEEE Std 80 and IEC 479-1,” IEE Proceedings - Generation, Transmission and Distribution, vol. 146, no. 6, pp. 593–601, Nov 1999. [8] OSRAM. (2014) High Power Infrared Emitter SFH 4550 FA Product Details. [Online]. Available: http://www.osramos.com/Graphics/XPic3/00116140 0.pdf [9] ——. (2014) Silicon PIN Photodiode SFH 213 FA Product Details. [Online]. Available: http://www.osramos.com/Graphics/XPic5/00101689 0.pdf 81 Optimal Conduction Angle of an E-pHEMT Harmonic Frequency Multiplier Krunoslav Martinčić, Zagreb University of Applied Sciences, Electrical Engineering dept., Zagreb, Croatia e-mail: krunoslav.martincic@tvz.hr Abstract - This paper describes a classic method of an analog harmonic multiplier which uses a C class a mplifier to pology. The g oal of this w ork is to propose the method of fi nding the optimal conduction angle of a E-pHEMT trans istor so that a better spectral purity of the second or third harmonics can be obtained. M ATLAB simulations are performed and the electronic circuit with an E-pHEMT trans istor is designed . Measurements are mad e in both ti me and frequency domain and a comparsion of the results is presented. INTRODUCTION With the development of modern technologies, the characteristics of electronic components are greatly enhanced along with increasing operating frequency of electronic circuits. The rapid expansion of information and communication technology leads to the congestion of currently used frequency bands and also to a need for higher operating frequencies. Highly stable and in the harmonicspectrum pure frequencies in higher GHz [1], [2] and THz [3] bands are practically impossible to generate directly in present time. Higher harmonic components of the fundamental frequency can be simply obtained by applying a sine voltage onto an electronic component having nonlinear current-voltage characteristics. Unfortunately, the byproduct of the wanted higher harmonics is also a large number of unwanted harmonic components of the fundamental frequency. The paper describes a sine signal distortion method applied to finding the optimal waveform with the aim of gaining a single, specific higher harmonic component. Semiconductor electronic components possess highly nonlinear current-voltage properties and as such are suitable for generating higher harmonics of fundamental frequency [4], [5]. The aim of this paper is to apply simulation tools and measurement of devices in order to obtain the defined harmonic component. The focus in this paper is not on the circuit topology, conversion efficiency, bandwidth or optimisation of high frequency matching under the nonlinear operating conditions of semiconductor devices. filter. It is better if the side spectrum components: (k –2)ω, (k –1)ω, (k+1)ω and (k+2)ω are as small as possible. In this way, the unwanted components of the spectrum can be well suppressed by simpler, low order filters. Since the concept of spectral purity in literature is frequently linked to the notion of phase noise of an oscillator, the quantitative factor that describes the issues studied here can be named Harmonic Spectra Purity Factor (HSPF). Similarly to the distortion of linear systems, HSPF can be defined via the power of spectral components as: ( )= 82 ) ) ( ; , ∈ N, > 1, (1) where P(kω) = P(ω k) is the power of the k-th harmonic ω k. If the signal only contains the component ω k, then HSPF(ω k) = 1, i.e. the signal is 100% spectrally "clean". The same method of the quantitative description can be applied to other nonlinear systems, e.g. to mixers. Symmetrical (balanced) structures will suppress some of the unwanted products better than unbalanced structures. Also, with the same topology, a well balanced circuit will have higher HSPF than an unbalanced circuit. The relation (1) is unsuitable for application, therefore, in practice, it can be used in a simplified form taking into account the spectrum of only two or four symmetric octaves with respect to ω k. The components of the spectrum which are more distant from ω k are more easily filtered out and generally have smaller amplitudes. HSPF2 and HSPF4, defined by two and four octaves, are expressed in Equations (2) and (3). For rational sums index, the nearest lower whole number (integer) is to be taken. 2( )= 4( )= QUANTITATIVE MODEL The quantitative model proposed here can be applied to any harmonically pumped nonlinear system. If we focus solely on the concept of multiplication of a harmonic frequency of a sine signal, then the objective is to obtain one particular, higher harmonic kω (k ∊ N, k > 1), with the power P(kω) derived from the initial signal. Nonlinear sine wave distortion generates a multitude of harmonic products nω (n = 1, 2, 3, ...). It is possible to extract the target component kω from such a spectrum by using a suitable ( ∑ ( ∑ ) ( ) ( ∑ ) ( ) ; , ∈ N, > 1. (2) ; , ∈ N, > 1. (3) In a specific situation, HSPF will be represented by the function of a selected electronic component, static operating point, circuit topology, the amplitude of the driving signal and the temperature. MIPRO 2016/MEET expected in the measurement results compared to the simulation results. In the realistic setting of the dynamic conditions, there are parasitic serial resistances, capacitances, and inductances in the active component which influence the shape of the signal in the time-domain as well as the spectrum. Therefore, one can expect a small difference between the data obtained from simulation and the results of measurement, especially at higher frequencies. According to (4), (5) and (6), Id will depend on the gate bias voltage UGSQ and the amplitude of the driving signal ug. The three fundamentally different operating modes are possible: Fig.1, Measurement setup a) PHYSICAL REALIZATION Due to the simplicity in the setting of quiescent condition and theoretically simple transfer characteristic, the circuit is realized by using an enhanced pHEMT [6] transistor of high transconductance. The schematic of the circuit and the measurement setup is shown in Fig. 1. Firstly, the static transfer characteristic of the transistor at Uds = 4V has been measured. HF and microwave transistors generally possess a narow gate and, consequentially, linear transfer characteristic in the saturation region. Due to nonlinearity around the threshold voltage, UGS0 (VTH), the transfer characteristic is interpolated by a semi parabola: = 0.3 − 0.27 = 0, ≤ 0.27 V A, > 0.27 V and (4) (6). + In the frequency multiplication circuits with transistors, the most commonly used mode is a C-class since such a waveform contains a multitude of higher harmonic components of the spectral distribution [7]: cos (5). The gate voltage Ugs is a sum of the DC bias UGSQ and AC driving signal ug (Fig. 1): = UGSQ ≤ UGS0, C-class, the transistor conducts for less than half of a period, with low side clipping, b) UGSQ > UGS0, non-linear operation, the transistor conducts for more than half of a period, depending on the amplitude of the driving signal, clipping is possible both on the bottom and the top side, c) UGSQ >> UGS0, depending on the amplitude of the driving signal, mainly saturation, and high side clipping. The measured and interpolated transfer functions are shown in Fig. 2. Some smaller differences are observed between the two curves, therefore, some deviations are ( =2 − , , )~ = 1 2 1−( , ≤ , 2 , ) =2 (7) , , ≥ 1, − . UGS0 = VTH = 0.27 V Uds = 4 V Fig. 3, Spectrum normalized to P(f1) for θ = 30º MATLAB simulation (FFT) of spectrum for narrow half- Fig. 2, E-PHEMT, Id(Ugs), Uds = const. MIPRO 2016/MEET sinusoidal pulse train (θ << 2π) is shown in Fig. 3 and for large conduction angle (θ ≈ π) in Fig. 4. The waveform and the spectrum depend on the gate bias voltage UGSQ and the amplitude of the driving signal ug. According to (4), (5) and (6) both of these variables unambiguously determine the conduction angle θ of the transistor. If the amplitude of the driving signal ug is held constant and the gate bias voltage UGSQ is being changed (Fig. 1), the clipping level and 83 Fig. 6, Id(t), θ = 120º, UGSQ = 0.12 V, P in = - 10dBm, fin = 100MHz Fig. 4, Spectrum normalized to P(f1) for θ = 170º consequently the conduction angle can be adjusted. For this reason, the simulation and the spectrum analysis by MATLAB have been done at a conduction angle θ as an independent variable. SIMULATIONS AND MEASUREMENTS With the constant amplitude ug, and by changing the gate bias voltage UGSQ, with the aid of a resistive divider R1 - POT1, according to Fig. 1, it is possible to adjust the Fig. 7, Spectrum, measured for θ = 120º compared to the prior case of the conduction angle θ = 170°, Fig. 4. The same procedure has been performed in the case of a frequency tripler, and Fig. 8 shows the obtained Fig. 5, HSPF2(ω 2) as a function of θ conduction angle from 0º to 360º. With smaller conduction angles, the total power of the spectral components is smaller, too. The aim of this work is not the optimisation of power or efficiency, but the optimisation of the spectral composition. Once the wanted component has been separated from the spectrum and the unwanted components have been sufficiently suppressed, the signal level can be amplified to the defined level by a linear amplifier. In Fig. 4 (θ = 170°), the fundamental harmonic dominance in the spectrum can be observed. With a frequency doubler, it would be desirable that the amplitudes of the fundamental and third harmonics are as low as possible. The simulation of the conduction angle ranging from 0º to 180º has been performed in MATLAB. Fig. 5 shows HSPF2(ω 2) as the function of the conduction angle. The optimal ratio of the amplitudes of spectral components in terms of generating the second harmonic occurs at a conduction angle of θ ≈ 117º, i.e. one third of the period. This case is shown for time domain in Fig. 6, and for frequency domain in Fig. 7. There is a relative increase of the amplitude of the second harmonic with respect to the amplitude of the fundamental component 84 Fig. 8, HSPF2(ω 3) as a function of θ Fig. 9, Spectrum normalized to P(f1) for θ = 70º dependence of HSPF2(ω 3) on the conduction angle. The graph shows the maximum for a conduction angle of 71º, MIPRO 2016/MEET i.e. ≈ 1/5 of the period. Fig. 9 shows the spectrum obtained by computer simulation for the referred case. From the graphs a relative increase can be observed in the power of the third harmonic compared to the first and second harmonic compared to the prior case of the frequency doubler and conduction angle θ = 120°. CONCLUSION This paper presents a model for quantitative description of harmonically pumped nonlinear systems. With the aid of MATLAB, an analysis is conducted, including the calculation of the proposed factors for the case of a frequency multiplier implemented by using E-pHEMT in C-class amplifier. The measurement results have confirmed the computer simulation results and have also justified the use of the proposed factors. In the future work, the intention is to carry out an analysis of circuits with an exponential current-voltage characteristic, as well as of symmetrical topologies with suppression of the fundamental component. MIPRO 2016/MEET REFERENCES [1] [2] [3] [4] [5] [6] [7] Yuan Chun Li, “20–40 GHz dual-gate frequency doubler using 0.5 µm GaAs pHEMT technology“, IEEE, Electronics Letters, Volume: 50, Issue: 10, 2014. Rauscher, C., "High Frequency Doubler Operation GaAs Field Effect Transistors", IEEE Trans. On Microwave Theory and Techniques, Vol. 31, No. 8, June 1983, T.W. Crowe, J.L. Hesler, S.A. Retzloff, C. Pouzou, G.S. Schoenthal, "Solid-State LO Sources for Greater than 2THz“, Virginia Diodes Inc., Charlottesville, VA, 2011 ISSTT Digest, 22nd Symposium on Space Terahertz Technology, April 26th-28th, 2011, Tucson Arizona, USA, www.vadiodes.com S. A. Maas, “The RF and Microwave Circuit Design Cookbook”, Artech House, Inc., Norwood, MA, 1998. Edmar Camargo, “Design of FET Frequency Multipliers and Harmonic Oscillators“, Artech House, Norwood, MA, 1998. Samuel Y. Liao, “Microwave Devices and Circuits“, Third Edition, Prentice Hall, Englewood Cliffs, NJ, 1996. S. A. Maas, “Nonlinear Microwave and RF Circuits”, Artech House, Inc., Norwood, MA, 2003. 85 Ultra-Wideband Transmitter Based on Integral Pulse Frequency Modulator T. Matić*, M. Herceg*, J. Job* and L. Šneler** * Faculty of Electrical Engineering Osijek, Osijek, Croatia ** Supracontrol, Zagreb, Croatia tmatic@etfos.hr Abstract - The paper presents a novel short-range wireless sensor node architecture, based on Integral Pulse Frequency Modulator (IPFM) and Ultra-Wideband (UWB) pulse generator. Due to the lack of internal clock signal source and multi user implementation without application of a microprocessor, the architecture is simple and energy efficient. Multi-user coding is performed using delay elements, where each user has unique delay time value. The output of the IPFM is fed to the delay element. Delayed and original signal from the IPFM are feeding UWB pulse generator. The paper presents the transmitter architecture and measurements of the transmitter signals in time and frequency domain. I. INTRODUCTION Wireless sensor networks have gained great attention in last two decades. Following development of the semiconductor technology and modern CMOS mixed signal circuit design, the wireless sensor networks are becoming the core part of the various modern communication systems [1-4]. Particularly important is its application in the systems where the node size and the power consumption is the critical demand, like Wireless Body Area Networks (WBAN) [5-8]. In such networks, the size of the wireless node and its power consumption are limited and it is important to achieve minimization and lowest possible consumption [9]. Ultra-wideband (UWB) communication systems have shown great promise in low power communication systems. Since the UWB communication systems are based on the narrow pulses, it provides energy consumption below 1 nJ per transmitted pulse. Depending on the application areas, for slowly varying signal measurements, where low duty cycle operation is enabled, wireless sensor node can provide extremely low power operation. In such a system, there is a promising application of energy harvesting power supplies that would enable battery-free operation. Wireless sensor nodes aimed to acquire a signal from an analog sensor are relaxing in terms on computational complexity. Due to analog input signal, they could consist of the simple analog-to-digital converter followed by an UWB generator employing suitable modulation for multiuser coding. Such a sensor could employ simple 1-bit analog-to-time conversion and delay based multiuser coding, as it is proposed in [10 and 11]. The proposed multi-user coding is based on transmitted reference ultra- 86 wideband modulation. The major difference is that it employs asynchronous transmission and no clock nor microprocessor is required at the transmitter side. Clock-less analog to time conversion can easily be implemented with Time Encoding Machine (TEM) [12] at the analog sensor output. The analog input signal is transformed to output pulse train that contains the information on analog input signal in time domain, like output pulse frequency, distance or duty cycle, depending on the type of TEM [11]. For energy efficient time-encoding, the best choice is pulse-based TEM, like Integrate and Fire (IAF) or Integral Pulse Frequency Modulator (IPFM) that transform analog input signal to pulse train with information in pulse distance/frequency. The maximum output signal value is present only during short term pulse duration, while the rest of the period output signal is equal to zero. Such an operation enables low duty-cycling and therefore low energy consumption. In next chapter, the paper presents the TEM theory, focusing on IPFM application for UWB wireless sensor application. The UWB IPFM transmitter is implemented as discrete-type laboratory prototype which can be used as a wireless sensor node at the analog sensor output. The third chapter present the measurement results. II. IPFM UWB TRANSMITTER A. Transmitter architecture The presented wireless transmitter is aimed for the application at the analog sensor output for remote signal acquisition. As it can be seen form Fig. 1, it transforms analog input signal to time information using TEM as a modulator. The pulse train y(t) is fed to the delay line that is used for multi user coding and optionally fed back to IPFM. Each sensor has unique delay value that is used for user identification at the receiver side. The delayed and original pulse train (y’(t) and y(t) signals) are fed to the UWB pulse generator where the pulse pair is formed for direct wireless transmission. The ultra-wideband generator can optionally be fed to the transmission line or antenna to achieve wireless or wired communication. In case of wireless transmission, the pulses are in high frequency bands over 3 GHz frequencies, while in wired mode communication, the frequency should be significantly lower to minimize losses and reflections. MIPRO 2016/MEET B. Time Encoding Machine The Time Encoding Machines transform the analog input signal to information at the output. The basic TEM circuits, according to [12] are Asynchronous Sigma-Delta Modulator (ASDM) and Integrate and Fire (IAF) modulator. Beside ASDM and IAF, Integral Pulse Frequency Modulator can also be considered as a TEM [11], since its consecutive output pulses distance is proportional to analog input voltage value. For ASDM application in TEM, the information on the analog input voltage is transformed to output pulse train duty cycle [13]. Therefore, the information on rising and falling edge is required for analog information recovery. Due to more complex transmitter and receiver architecture for transmission of positive and negative pulses, IPFM is more attractive for transmitter implementation. Since information is carried only in pulse distances, only positive UWB pulses can be transmitted for reliable analog signal demodulation at the receiver side. Figure 1. Block scheme of the IPFM UWB transmitter The block scheme of the Integral Pulse Frequency Modulator is depicted in Fig.2. The analog input signal defines the comparator threshold voltage. To enable unipolar supply for bipolar input voltages, the input voltage span is transformed from [-VCC, VCC] to [0, VCC], where c = 0.5, while Sref is equal to VCC/2. The integrator L(s) from the Fig. 2 constantly integrates until the integrator output l(t) reaches comparator threshold. The transition of the comparator output y(t) occurs and feedback loop signal y’(t) resets the integrator to integrate from zero voltage till the next crossing occurs. The delay block can be implemented outside the feedback loop or inside, like in Fig.2. The consecutive pulse distance is proportional to the comparator threshold uTH(t) = x(t)/2 + VCC/2 for ideal integrator L(s) with time constant τi. The integrator output signal in time domain can be expressed by the following equation: l (t k )   tk  tk 1 VCC i dt  Ck 1 .   At the time tk when integrator triggers comparator, under consumption that at the beginning of the integration slope tk-1, the integrator output was equal to 0 (l(tk-1) = Ck-1 = 0) the following equation holds:  l (t k )  tk  tk 1 VCC i dt  uTH  x(tk ) VCC  , 2 2  if τi << (tk - tk-1). According to (2), the distance between consecutive output pulses Tk and Tk-1 of the signal y(t), if τn = 0 is equal to:  Tk  Tk 1  t k  t k 1  TON  TOFF   i 2VCC x (t k )  MIPRO 2016/MEET i 2  TON  TOFF ,   Figure 2. Block scheme of the IPFM modulator where TON and TOFF are on and off switching times of the integrator L(s) and their sum is equal to the output pulse width (TP = TON + TOFF). For τn ≠ 0, the equation (3) becomes:  Tk  Tk 1  i 2VCC x (t k )  i 2  2 n  TON  TOFF .  If integrator time constant τi, multi user delay τn and switching delay times TON and TOFF are known at the receiver side, according to (4), it’s easy do demodulate pulse train and to obtain analog input voltage value x(tk). The difference in two multi user delays τn - τn-1 should be high enough that variation in integrator time constant τi, delay times TON and TOFF does not affect detection at the receiver side. C. Circuit implementation The pulse train y(t) is applied to the delay line which generates delayed replica y’(t). In that way, two pulse trains are formed to feed the UWB pulse generator for the wireless pulse transmission. The discrete type prototype is implemented, based on modified version of UWB generator published in [14]. Modification allows impulse pair generation, independently on the rectangular pulse width. The implemented circuit is depicted on Fig.3. Instead of the operational amplifier (OPAMP) implementation, the simple RC-filter is used as integrator. One OPAMP is used as comparator within IPFM, while the other one is used as a comparator within delay circuit. Both 87 Figure 3. The circuit implementation of the IPFM UWB transmitter Figure 4. The output of the comparator y’(t) Figure 5. The step recovery diode voltage comparators trigger UWB generator inputs that are implemented with high frequency transistors that trigger step recovery diode connected to microstrip line. At each rising edge of both original and delayed pulse trains, the UWB generator will form an ultra-wideband pulse which are summed and form signal yUWB (t) to be sent wirelessly. The microstrip line is connected to antenna via capacitor. Since analog input voltage is unipolar signal, constant c = 1 and Sref is not connected. III. RESULTS The laboratory prototype from Fig.3 was implemented in discrete components implementation and for UWB pulse generation step recovery diode has been used. Due to constant forward biasing and constant current consumption this implementation is not energy efficient. To achieve low power operation IC based pulse generation techniques have to be used to ensure low duty cycling and zero output current in excess of UWB pulse [15 and 16]. The delayed comparator output pulse train y’(t) is depicted in Fig 4. Information on the input signal level is contained in the pulse distance Tk – Tk-1, which is equal to 500 ns in the present setup. 88 Figure 6. The output spectrum of the UWB pulse train Fig.5. presents the step recovery diode voltage. It is visible that the pulse pair reversely polarizes diode. At each reverse polarization pulse step recovery diode will MIPRO 2016/MEET form the narrow Gaussian pulse. The spectrum of the transmitted pulse is measured with receiving antenna at the 30 cm distance without LNA. Received power spectrum is depicted on Fig. 6. Due to limitations of the step recovery diode based UWB pulse shaper, the pulse in not positioned entirely in unlicensed spectrum. [4] [5] IV. CONLUSION The paper presents novel UWB IPFM transmitter for short range wireless communications. Architecture of the transmitting system is achieved without internal clock application nor microprocessor application for multi-user coding. Unique delay enables simple and efficient multiuser coding and UWB pulses provide low energy consumption per transmitted pulse. To achieve low power operation, future work is directed to CMOS IC implementation of the system which will enable low power operation in periods of non-activity between two consecutive UWB pulses. Downscaling the pulse width to sub 10 ps range, the integrator time constant could be lowered to enable integration of the capacitors on silicon as well. Also, IC design will provide more efficient spectral shaping and spectral positioning in unlicensed spectrum over 3 GHz due to delay-based UWB pulse generation and more accurate pulse shaping comparing to SRD implementation. ACKNOWLEDGMENT This work has been supported in part by Croatian Science Foundation under the project UIP-2014-09-6219 Energy efficient asynchronous wireless transmission. REFERENCES [1] [2] [3] I. F. Akyildiz, W. Su, Y. Sankarasubramaniam, and E. Cayirci, “A survey on sensor networks,” IEEE Commun. Mag., vol. 40, no. 8, pp.102–114, Aug. 2002. Li-Yuan Chang; Pei-Yin Chen; Tsang-Yi Wang; Ching-Sung Chen, "A Low-Cost VLSI Architecture for Robust Distributed Estimation in Wireless Sensor Networks," in Circuits and Systems I: Regular Papers, IEEE Transactions on , vol.58, no.6, pp.12771286, June 2011. Bellasi, D.E.; Benini, L., "Energy-Efficiency Analysis of Analog and Digital Compressive Sensing in Wireless Sensors," in Circuits MIPRO 2016/MEET [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] and Systems I: Regular Papers, IEEE Transactions on , vol.62, no.11, pp.2718-2729, Nov. 2015. Bellasi, D.E.; Rovatti, R.; Benini, L.; Setti, G., "A Low-Power Architecture for Punctured Compressed Sensing and Estimation in Wireless Sensor-Nodes," in Circuits and Systems I: Regular Papers, IEEE Transactions on , vol.62, no.5, pp.1296-1305, May 2015. S. Diao, Y. Zheng, and C.-H. Heng, “A CMOS ultra low-power and highly efficient UWB-IR transmitter for WPAN applications,” IEEE.Trans. Circuits Syst. II, Exp. Briefs, vol. 56, no. 3, pp. 200–204, 2009. Poon, C.C.Y.; Lo, B.P.L.; Yuce, M.R.; Alomainy, A.; Yang Hao, "Body Sensor Networks: In the Era of Big Data and Beyond," in Biomedical Engineering, IEEE Reviews in , vol.8, no., pp.4-16, 2015. Sarkar, S.; Misra, S., "From Micro to Nano: The Evolution of Wireless Sensor-Based Health Care," in Pulse, IEEE , vol.7, no.1, pp.21-25, Jan.-Feb. 2016. Ozols, K., "Implementation of reception and real-time decoding of ASDM encoded and wirelessly transmitted signals," in Radioelektronika (RADIOELEKTRONIKA), 2015 25th International Conference , vol., no., pp.236-239, 21-22 April 2015. Sarpeshkar, R., "Universal Principles for Ultra Low Power and Energy Efficient Design," in Circuits and Systems II: Express Briefs, IEEE Transactions on , vol.59, no.4, pp.193-198, April 2012. T. Matic, M. Herceg, J. Job, “Energy-efficient system for distant measurement of analogue signals”, WO/2014/195739, 11.12.2014. T. Matic, M. Herceg, J. Job, “Energy-efficient system for distant measurement of analogue signals”, WO/2014/195744, 11.12.2014. A. A. Lazar and L. T. Toth, “Time encoding and decoding of a signal”, US 7,573,956 B2, 11 August, 2009. Ouzounov, S.; Engel Roza; Hegt, J.A.; van der Weide, G.; van Roermund, A.H.M., "Analysis and design of high-performance asynchronous sigma-delta Modulators with a binary quantizer," in Solid-State Circuits, IEEE Journal of , vol.41, no.3, pp.588-596, March 2006. P. Protiva, J. Mrkvica, J. Macháč, Universal Generator of UltraWideband Pulses, Radioengineering, vol. 17, no. 4, December 2008. Xie, H.L.; Wang, X.; Wang, A.; Qin, B.; Chen, H.; Zhao, B., "An ultra low-power low-cost gaussian impulse generator for uwb applications," in Solid-State and Integrated Circuit Technology, 2006. ICSICT '06. 8th International Conference on , vol., no., pp.1817-1820, 23-26 Oct. 2006. Radic, J.B.; Djugova, A.M.; Videnovic-Misic, M.S., "A 3.1–10.6 GHz impulse-radio UWB pulse generator in 0.18µm," in Intelligent Systems and Informatics (SISY), 2011 IEEE 9th International Symposium on , vol., no., pp.335-338, 8-10 Sept. 2011. 89 Design of a transmitter for high-speed serial interfaces in automotive micro-controller A. Bandiziol*, W. Grollitsch**, F. Brandonisio**, R. Nonis**, P. Palestri* * Università degli Studi di Udine, Udine, Italy ** Infineon Technologies, Villach, Austria palestri@uniud.it Abstract - This work reports about the system level design of a transmitter for the next generation of High-Speed Serial Interfaces (HSSI) to be implemented in a micro-controller for automotive Electronic Control Unit (ECU) applications, pushing the transmission speed up to 10 Gbps over a 10cm long cable. A voltage mode architecture is selected for low power considerations. We focus our analysis here on the system-level implementation of Feed-Forward Equalization as an FIR filter consisting of different transmitter slices driven by different bits in the data sequence. We consider different data rates and number of taps and analyze how the performance of the equalizer is affected by the quantization of the values of the taps in the practical implementation of the FIR filter. I. INTRODUCTION High-speed digital I/O is becoming increasingly popular and emerged as one of the hot research topics in microelectronics. In this context, high-speed serial links are widely used in a variety of different fields [1]. Moreover, in the last twenty years the speed of these links has been constantly increasing, leading to the adoption of a broad variety of standards [2]. Since the baud rates at which these links work today are extremely high, inter-symbol interference (ISI) has become the most limiting factor. In order to mitigate ISI, the most effective solution is channel equalization [3]. Equalization can be done either at the transmitter side just before the channel (and it is called Feed-Forward Equalization, FFE) or at the receiver (for example in the form of Decision-Feedback Equalization, DFE). Figure 1 shows the general structure of a transmit- ter with feed-forward equalization: delayed versions of the bit stream are fed to drivers with different strengths, thus implementing an FIR filter. II. DRIVER ARCHITECTURE For the transmitter, a voltage mode architecture has been chosen, because it consumes less power than a current mode with the same output swing [4]. Figure 2 shows the general structure of a voltage-mode differential driver: two inverters are driven by bit bi and bi to produce a differential voltage vo. The MOSFETs are large enough to have a negligible voltage drop when on. Impedance matching is implemented by the resistancesRDi. Figure 2 is also the starting point to explain the basic principle of FFE. A single driver can be split into many slices, and the same bit bi of the serial data input can drive many of these slices, i.e. the different drivers in Figure1 are each formed by many slices with the schematics of Figure 2. This can be better understood in Figure 3. The index “i” of a bit indicates the position in the bit stream. When dividing the driver into slices, one must still guarantee impedance matching. In fact, when FFE is not implemented, only one slice might be used, and its output resistance should match R0=50Ω. If more slices are used, it has to be considered that the total resistance of all the slices in parallel should be 50Ω, leading therefore to bigger values for RDi of each slice. This results in the following equation ∑ i 1 RDi = 1 R0 (1) Under this assumption, one can then write a closed form expression for the output voltagevo, which reads v0 = VDD n ⋅ ∑ ( i ) ⋅ bi ⋅ sgn i M 2 i (2) where (see Figure 3) ni is the number of slides connected to the bit bi, M is the number of total slices and sgni is the sign of bi (i.e. putting bi at the left or right part of Figure 2 making the driver inverting or non-inverting). For the bi, ‘1’ corresponds to bi =1 and ‘0’ means bi =-1. Eq. (2) can be rewritten as Figure 1. General scheme of a serial link implementing FFE using driver slices 90 MIPRO 2016/MEET cess is the zero-forcing method [6]. Here the discussion will be at a tutorial level to bridge the gap between the theory in [6] and the actual hardware implementation (Figs. 2, 3). The goal is to minimize the distance between the desired response of the transmitter+channel (i.e. a signal without ISI) and that actually received; this is done via a Least Squares Minimization problem that reads min wZFE || z DES − H CH wZFE || 2 Figure 2. General scheme of a voltage-mode transmitter. v0 [ j ] = VDD 2 ∑ w ⋅ b [ j] = i i i (3) VDD 2 ∑ w ⋅ data[ j − i ] i i where j indicates a bit period and wi is the strength of the slices connected to the i-th bit normalized to the full driver strength. From now on, we will refer to wi as to the “weight” for the i-th tap. From Eq. (3) we see that the structure realizes an FIR filter, which implements a convolution between the bit stream (data[j]) and the tap vector wi. It should be noted that in order to obtain fine impedance tuning and compensate PVT (Process, Voltage and Temperature) variations, one does not strictly follow Eq. (1): the driver (divided in slices) is sized to obtain a resistance much larger than 50Ω; then many replicas of the structure are duplicated and the number of such replicas put in parallel is adjusted to match the 50Ω target [5]. III. CHOICE OF EQUALIZATION TAPS In this paragraph, we will cover the basic steps that are required to determine the weights wi for FFE. The approach that we follow in our equalization pro- Figure 3. Example of FFE implementation using slices (with schematics as in Figure 2) put in parallel. MIPRO 2016/MEET (4) where wZFE is the weights vector [w0,w1,…,wi,…wN] to be determined by the minimization problem andzDES is the desired output response. In other words, if we consider a bit stream ‘10000’, the transmitter will generate a sequence of pulses of height [w0,w1,…,wi,…wN], each one stimulating the channel. We want to set wi in such a way that the receiver samples the original sequence ‘10000’ without ISI, meaning that zDES =[1,0,0,…,0.]. HCH is a matrix that rearranges the channel pulse response (h) in order to transform the convolution with the different pulses of height wi into a matrix product. For example, if the channel pulse response is not null only for the first three samples and we decide to equalize with 5 post-cursor taps, then we have h0 h  1 h2  HCH =  0 0  0 0  0 h0 0 0 0 0 h1 h2 0 0 0 h0 h1 h2 0 0 0 h0 h1 h2 0 0 0  0  0 h0   h1  h2  (5) The solution of the minimization problem introduced by Eq. (4) requires extracting the channel pulse response. Therefore the simulation setup shown in Figure 4 is used. This setup is implemented in Ansys Electronic Desktop [7] and consists of a differential link with a pulse generator (with ideally steep rise and fall times) and an Sparameter block. This block changes from system to system and represents all the elements that compose the system after the transmitter. In this work we analyze two different systems, which we will refer as “BGA system” and “Leadframe system”. The Ball Grid Array (BGA) system is composed by a via, approximately 5 mm long, which connects the transmitter output signal to the package output, a BGA package, a Printed Circuit Board (PCB) and a cable (10 cm long). At the receiver end, the Figure 4. Setup implemented in Ansys Electronic Desktop in order to extract the channel pulse response 91 Figure 8. Setup implemented in Ansys Electronic Desktop in order to extract the obtain the eye diagram of the system Figure 5. S21 of the BGA System zDES = [1,0,0,0]. If pre-cursor taps are used, then the zDES elements are shifted to the right by a number of places equal to the number of pre-cursor taps. IV. Figure 6. S21 of the Leadframe System impedance is matched at 100Ω differential and the differential output is measured via voltage probes. The Leadframe System is similar to the BGA one, but instead of a BGA package it uses a leadframe one and the cable lowpass characteristic has been mimicked by lumped elements. Figs. 5 and 6 show the S21 of both systems. The solution of the Eq. (4) reads wZFE = ( H CH ⋅ H CH ) −1 ⋅ H CH ⋅ z DES T T Once the optimal weights have been found, one can then evaluate the effectiveness of FFE not only by looking at the channel pulse response (as in Figure 7), but also by simulating a structure equivalent to the system composed by transmitter, package, chip and channel and evaluating the improvements obtained in the eye diagram. The software used to this end is Ansys Electronic Desktop. Figure 8 illustrates a typical simulation setup: the only difference to the structure presented in Figure 4 is the use of a PRBS (Pseudo Random Bit Sequence) generator instead of a pulse generator. The PRBS generator offers the possibility to adapt its output based on equalization weights inserted by the user: in this case the vector wZFE obtained with Eq. (6). The low and the high voltage levels have been set to -450mV and 450 mV, respectively. Thermal noise typical of MOSFETs in the TX has not been modeled and will be one of next steps of our work. (6) which is the well-known solution for the LS objective function of Eq.4. Once the weights wZFE have been obtained, one can evaluate how close the overall transmitter+channel response fits the wanted zDES by checking it against the product HCH ⋅ wZFE (i.e. the overall response of the FFE+channel). This is shown in Figure 7, which compares the effect that equalizations with different number of taps has on the overall pulse response of the channel of Figure 6. In this particular case (consistent with the description above), only post-cursor taps were used, therefore the desired impulse channel response would be a 1 followed by a number of zeroes equal to the post-cursor taps used. So, for a 4-taps equalization (main and three post-cursors) the desired channel pulse response would be Figure 9. Eye diagram at 2.5 Gbps for BGA System obtained with a transsitor level simulation and one tap de-emphasis. The transmitter is described in [8]. Channel Pulse Response 1 Output Voltage [V] EXAMPLE WITH REALISTIC CHANNELS Pulse Pulse Pulse Pulse Pulse Pulse 0.8 0.6 Response pre-FFE Response 2-post Response 3-post Response 4-post Response 5-post Response 6-post 0.4 h(5) 0.2 0 h(-2) h(-1) h(0) h(1) h(2) h(3) 20.8 21 21.2 h(4) 21.4 21.6 21.8 Time [ns] 22 22.2 22.4 22.6 Figure 7. Response of channel of Figures 4-6 operating at 10Gbps along with responses after equalization for various numbers of taps 92 Figure 10. Eye diagram at 2.5 Gbps for BGA System obtained with a PRBS generator and one tap de-emphasis. MIPRO 2016/MEET The channel is terminated with a 50 Ohm resistance. In order to confirm the validity of our simplified approach, in Figs. 9 and 10 two eye diagrams of the same system (at 2.5 Gbps) are shown. One obtained with a PRBS generator and the other one with a transistor level model of the transmitter: the eye diagram parameters are very similar. Since the system level design is performed before actually designing the TX at transistor level, in the following, for the 5 Gbps and the 10 Gbps cases only the PRBS source is used. Figs. 11, 12, 13 and 14 report the eye diagrams in such cases with and without equalization. For the 5 Gbps situation, the improvement due to FFE is marginal, whereas to work at 10 Gbps with the Leadframe System, FFE is mandatory. Note that, at given VDD, the inclusion of FFE lowers the high and low levels of the eye (e.g. 400mV vs. 300mV in Figs.9 and 10): this is because when FFE is implemented, some slices will be driven by bits having opposite sign with respect to the main one, and this implies that the driver is not working at full strength. V. Figure 11. Eye diagram at 5 Gbps for Leadframe System without FeedForward Equalization. EFFECT OF TAP QUANTIZATION Eq. (6) provides optimum tap weights, but one must also think at a real world implementation, which obviously implies quantizing these obtained weights since each bit will be connected to a finite number of slices. This problem is peculiar to the voltage-mode transmitter divided in slices. In fact previous works already introduced equalization implemented with sliced drivers, but mainly in current mode logic [9]-[10], which makes equalization easier to implement with high granularity. For this reason, we analyzed the effect of quantization with two different granularities, 8 and 16 levels (i.e. M=8 or M=16 as in Figure 3). These two different granularities Figure 12. Eye diagram at 5 Gbps for Leadframe System with 6 postcursor taps. The weights that generate this eye are w0=0.8009, w1=0.0403, w2=-0.1018, w3=0.0395, w4=-0.0045, w5=-0.013. Figure 15. Eye diagram at 10 Gbps for Leadframe System with 6-post cursor taps and quantization step of 1/8. The weigths are w0=0.625, w1=0.125, w2=0, w3=0, w4=-0.125 and w5=0.125, which correspond to 5 slices Figure 13. Eye diagram at 10 Gbps for Leadframe System without FeedForward Equalization. connected to b0 while b1 , b4 and b5 require one slice each. Figure 16. Eye diagram at 10 Gbps for Leadframe System with 6-post cursor taps and quantization step of 1/16. The weights are w0=0.5625, w1=0.1875, w2=0, w3=0.0625, w4=-0.125 and w5=0.0625, which correspond to Figure 14. Eye diagram at 10 Gbps for Leadframe System with 6 postcursor taps. The weights that generate this eye are w0=0.577, w1=-0.1526, 9 slices connected to b0, three connected to b1 , one slice connected to b3, w2=0.0148, w3= 0.0579, w4=-0.1392 and w5=0.0585. one to b5 and two to b4 . MIPRO 2016/MEET 93 offer quantization steps of 0.0625 and 0.125 respectively. Figs. 15 and 16 show the effect of quantization on the operation at 10 Gbps of Leadframe System when equalized with 6 post-cursor, which without quantization has already been shown in Figure 14. With 16 slices we obtain eye parameters very close to what is obtained from Eq. (6), whereas with 8 slices there is a degradation of the eye. VI. CONCLUSIONS We have reported on the design of a high speed transmitter at 10 Gbps, focusing on the system level planning of the feed-forward-equalization. It has been shown that for a realistic channel typical of automotive ECU applications, communication at 10 Gbps requires 6 taps. The driver has to be partitioned in at least 16 slices in order to have the requested granularity of the taps for the FIR filter. If the transmission speed gets even higher, then a higher number of taps is needed and the effect of quantization becomes more and more relevant. The eye parameters with and without FFE and including different granularity in the tap quantization are summarized in Figs. 1720. These will be checked against experimental data once the test chip will be available. Transistor level design with an advanced CMOS technology is ongoing and the results will be published later on. We include in these figures also a 15 Gbps situation that requires 9 taps for equalization when considering the Leadframe System (7 for the BGA System). In Figs. 17-20 we see that FFE improves the eye height and width, although part of the improvement is lost if a too coarse granularity is used for thewi. ACKNOWLEDGMENT The authors would like to thank Prof. L. Selmi (University of Udine) for support and for many helpful discussions. REFERENCES [1] Carusone, Tony Chan. "Introduction to Digital I/O: Constraining I/O Power Consumption in High-Performance Systems." SolidState Circuits Magazine, IEEE, vol. 7, no. 4 (2015): 14-22. [2] Chang, Ken, Geoff Zhang, and Christopher Borrelli. "Evolution of Wireline Transceiver Standards: Various, Most-Used Standards for the Bandwidth Demand." Solid-State Circuits Magazine, IEEE, vol. 7, no. 4 (2015): 47-52. [3] Bulzacchelli, John F. "Equalization for Electrical Links: Current Design Techniques and Future Directions." Solid-State Circuits Magazine, IEEE, vol. 7, no. 4 (2015): 23-31. [4] Razavi, Behzad. "Historical Trends in Wireline Communications: 60X Improvement in Speed in 20 Years." Solid-State Circuits Magazine, IEEE, vol. 7, no. 4 (2015): 42-46. [5] Kossel, Marcel, et al. "A T-coil-enhanced 8.5 Gb/s high-swing SST transmitter in 65 nm bulk CMOS with 16 dB return loss over 10 GHz bandwidth." Solid-State Circuits, IEEE Journal of, vol. 43. no. 12 (2008): 2905-2920. [6] Proakis, John G. Intersymbol interference in digital communication systems. John Wiley & Sons, Inc., 2001. [7] Ansys Inc. PDF Documentation for Release 15.0 [8] Cossettini, A., et al. "Design, characterization and signal integrity analysis of a 2.5 Gb/s High-Speed Serial Interface for automotive applications overarching the chip/PCB wall." Research and Technologies for Society and Industry Leveraging a better tomorrow (RTSI), 2015 IEEE 1st International Forum on. IEEE, 2015. [9] Bulzacchelli, J. F., et al. "4.1 A 10Gb/s 5-Tap-DFE/4-Tap-FFE Transceiver in 90nm CMOS." Solid-State Circuits, IEEE Journal of, vol. 41. no. 12 (2006): 2885-2900. [10] Beukema, Troy, et al. "A 6.4-Gb/s CMOS SerDes core with feedforward and decision-feedback equalization."Solid-State Circuits, IEEE Journal of 40.12 (2005): 2633-2645. Quantization Step 1/16 Quantization Step 1/16 Quantization Step 1/8 Quantization Step 1/8 Figure 17. Eye height versus transmission speed for Leadframe System Figure 19. Eye height versus transmission speed for BGASystem when not when not equalized, optimally equalized and when quantization is applied equalized, optimally equalized and when quantization is applied to wi. to wi. Quantization S tep 1/16 Quantization S tep 1/8 94 Quantization Step 1/16 Quantization Step 1/8 Figure 18. Eye width versus transmission speed for Leadframe System Figure 20. Eye width versus transmission speed for BGASystem when not when not equalized, optimally equalized and when quantization is applied equalized, optimally equalized and when quantization is applied to wi. to wi. MIPRO 2016/MEET Application of the calculation-experimental method in the design of microwave filters A.S. Geraskin*, A.N. Savin*, I.A. Nakrap**, V.P. Meshchanov*** National Research Saratov State University/ Faculty of Computer Science and Information Technologies, Saratov, Russian Federation ** National Research Saratov State University/ Faculty of Physics, Saratov, Russian Federation *** Yuri Gagarin State Technical University of Saratov/ Institute of electronic engineering and mechanical engineering, Saratov, Russian Federation Gerascinas @mail.ru, savinan@info.sgu.ru, nakrapia@info.sgu.ru, nika373@bk.ru * The estimation of the applicability of calculationexperimental method to optimize the design of microstrip microwave filters on the example of the band-pass filter on the half-wave resonators with Chebyshev characteristic. This method is based on an iterative process of the correction of the result designed device parameters synthesis in his experimental output characteristics. It is shown that the method allows to consider the influence of various factors relating to the manufacturing process and features of the materials used. However, an additive method of accounting in the target function of deviations from given output characteristics does not provide a rapid convergence process at large deviations. In order to increase the efficiency of calculation experimental optimization method is offered its modification, consisting in additional correction parameters of materials and mathematical models of components of the experimental output characteristics of the device. Accounting irregularities of the relative dielectric permittivity and the thickness of the microstrip filter components in its design with the help of a modified method has allowed for a single iteration to reduce the unevenness of the reflection coefficient module of the filter with 7.1 dB to 1.7 dB. I. INTRODUCTION At the present stage of science development and technology are widely used various microstrip microwave devices. The development of such devices assumes the numerical or physical modeling of the processes occurring in them, and following optimization of the device in order to achieve the required values of its output parameters [1]. Design of microwave devices, as a rule, is carried out with the use of specialized computer-aided design [2]. However, the real parameters of the microstrip microwave devices can be significantly different from designed parameters. This can be due to various factors: inhomogeneity of parameters of the materials used, MIPRO 2016/MEET deviations microstripes size and quality of their surface, and inaccurate mathematical models of the device components. In this connection arises accounting task of indicated factors during development. In [3] propose calculationexperimental method of parametric optimization (CEMO), allowing to solve the given task based on the use of experimental output characteristics of microwave devices. Purpose of the given work is to study the possibility of applying CEMO at designing of microstrip microwave filters. II. THE ALGORITHM OF THE CALCULATIONEXPERIMENTAL METHOD BASED ON CHEBYSHEV APPROXIMATION OF THE OUTPUT CHARACTERISTICS In the process of designing microwave devices, as a rule, solves the problem of parametric synthesis of the vector v variable parameters of the device in order to minimize the deviation of the output characteristic of the from the given characteristics of in the frequency domain : (1) The implementation process is carried out usually with the use of specialized tools computer-aided design (for example, NI AWR Microwave Office [2]) using various approximate mathematical models of components for microwave devices design. Accordingly, the resulting output characteristic may differ from the experimental characteristics of a real device ( is the optimal vector of parameters in the device synthesized in the solution of the task (1)). Proposed in [3] CEMO is based on correction results of the synthesis using the experimental device characteristics. Thus the physical implementation of the device is considered as an intermediate step of the iterative process of its development. You can select the following steps of the algorithm of this method [3, 4]: 95 1. 2. 3. Synthesizing optimal vector of the microwave device parameters for a given frequency domain the characteristic and the calculation of the output characteristics (i.e. the solution of task (1)) with the use of computer-aided design. Making of microwave devices designed for the model and measuring the real characteristics at the points of the frequency range , corresponding to the output characteristics of the Chebyshev approximation. The use of Chebyshev approximation allows the use in the synthesis process of the vector the minimum number of points for a given accuracy of the approximation and correspondingly the minimum number of experimental data. The construction new target characteristics , takes into account the difference between the calculated and experimental output characteristics is carried out by the specular reflection amendment relative to the origin given characteristics : (2) The use of output characteristics for a given correction characteristics allows to take into account inaccuracies of the mathematical model of the output characteristics and the features of the materials used and the manufacturing process of the microwave devices. 4. 5. 6. Synthesis of new optimal vector parameters of microwave devices by new a given characteristic and the calculation of the output characteristics (i.e. solution of task (1) with a modified target function). Thus as the initial approximation used the earlier resulting vector . Making the new device with the parameter vector , and to measure its output characteristics . If the obtained output characteristic the required accuracy coincides with a given output characteristic , the process ends with the development of the device, otherwise returns to step 3 with the use of (2) a correction value in the formation of new characteristics and a given vector as a first approximation for synthesizing the device settings at step 4. In [3] it is shown that in most cases only one of the calculation-experimental cycle for the satisfactory coincidence of the functions with a given . In order to assess the applicability of this method in the designing microstrip microwave filters was carried out 96 to develop a band pass filter fourth order on half-wave resonators by the given algorithm. III. DEVELOPMENT OF MICROSTRIP MICROWAVE FILTER USING COMPUTATIONAL EXPERIMENTAL METHOD Developed microstrip microwave filter should have the following characteristics:          type of filter – band-pass; center frequency ; the boundary frequencies of passband: , ; the non-uniformity of modulus of transmission coefficient of the filter in a passband of not more than ; the module of the reflection coefficient filter in the passband must have the minimum possible deviation from the theoretically calculated and the minimum non-uniformity, measured by the maximum in bandwidth; the filter should be connected to microstrip lines with a characteristic impedance ; the required transfer function must be the Chebyshev polynomial of the fourth order; mathematical models of the components of the filter must have the ultimate shape; the substrate material of the filter – two-sided coated fibreglass FR4-2 c parameters: , , , . Filter topology on half-wave resonators, working principle of which is discussed in detail in [5], is shown in Fig. 1. a) b) Figure 1. Topology of the developed filter In accordance with steps 1 and 2 CEMO in the program NI AWR Microwave Office (Trial version) was synthesized microstrip band pass filter fourth order on half-wave resonators with Chebyshev transfer function, satisfying the above requirements and made its prototype. The vector of optimal parameters of the filter are given in section a) Table. 1. Calculated and experimental –parameters of the filter shown in Fig. 2. MIPRO 2016/MEET A sufficiently large difference between the experimental characteristics of the filter calculated from “Fig. 2” may be related to characteristics of the technology used in the manufacture of filter inhomogeneity relative permittivity of the substrate and the thickness of microstrips “Fig. 1b”, the quality of the surface, and inaccurate mathematical model of the filter, taking into account in the calculation of -wave. On the basis of the detailed analysis of the influence of the dimensions of the filter to half-wave resonators in its output characteristics, is given in [5], were defined varying parameters in the vector : - length of microstrip all components of the filter and – gaps in segments of the associated lines “Fig. 1”. New vector , the optimal filter parameters, obtained by synthesis on a given new feature given in item b) of the table. 1. Output characteristics of the layout of the filters with the parameters defined by the vectors , is shown in Fig. 3. TABLE I. № compon ent of filter Figure 2. Calculated (   , - - - -) and experimental (♦, - -♦- -) -parameters of the filter parameters synthesized in NI AWR according to specified requirements Further optimization of the filter design for the purpose of reduction of reflection coefficient module layout of the experimental to the calculated mean was carried out in accordance with steps 3 – 5 CEMO. In accordance with the requirements for the formation of a new specified in the expression (2) as the output and set characteristics of the considered module of the reflection coefficient calculated filter, and the output characteristics of PARAMETERS OF THE SYNTHESIZED FILTERS , mm a) , mm , mm The filter parameters characteristic , mm , mm , synthesized by a given 1, 5 1.5 1.9 0.326 17.24 0.035 4.4 2, 4 1.5 2.7 1.383 16.73 0.035 4.4 3 1.5 2.7 1.883 17.08 0.035 4.4 6, 7 1.5 2.856 8.606 0.035 4.4 b) The filter parameters , synthesized by a given characteristic 1, 5 1.5 1.9 0.326 17.25 0.035 4.4 2, 4 1.5 2.7 1.386 16.72 0.035 4.4 3 1.5 2.7 1.914 17.09 0.035 4.4 6, 7 1.5 2.856 8.83 0.035 4.4 c) The filter parameters , synthesized by a given characteristic in view of inhomogeneity and 1, 5 1.5 1.9 0.326 17.24 0.039 4.283 2, 4 1.5 2.7 1.383 16.73 0.033 4.507 3 1.5 2.7 1.883 17.08 0.040 4.348 6, 7 1.5 2.856 8.606 0.035 4.727 d) The filter parameters , synthesized by a given characteristic in view of inhomogeneity и 1, 5 1.5 1.9 0.345 17.351 0.039 4.283 2, 4 1.5 2.7 1.371 16.605 0.033 4.507 3 1.5 2.7 1.880 17.113 0.040 4.348 6, 7 1.5 2.856 6.900 0.035 4.727 The difference between the outer lateral maximums characteristic filter with parameter vector is 5 . The difference for the characteristic with the vector parameters is equal “Fig. 3”. As you can see, using one iteration, CEMO, to improve the characteristics of the filter failed. This may be explained by the wrong choice at step 4 variable filter parameters, do not provide in this case a consideration of the factors discussed above affecting the output characteristic. Figure 3. Experimental -parameters of the filter with the parameters of (•, - -•- -) and (♦, - -♦- -), obtained CEMO experimental – of experimental filter, obtained in steps 1 and 2. Thus the calculated and experimental was set to points of the frequency range used Chebyshev approximation “Fig. 2”. MIPRO 2016/MEET For the purpose of solving this task was proposed to modify steps 3, 4 CEMO by using prior synthesis to adjust the parameters of the materials used and, if necessary, the study of mathematical models of components of the device according to the experiment proposed in [6] for the synthesis of waveguide microwave systems [7]. This idea has been used to optimize the filter developed by adjusting the technological parameters of 97 the experimental layout: the relative dielectric constant of the substrate and the thickness of microstrips. Modified CEMO step 3 (step 3’) consists of the solutions of problem (1) synthesis of the optimal correcting vector in which the varied parameters are only and elements, and the remainder obtained in step 1 correspond to the vector . At the same time as the given characteristics of is used characteristic output experimental – , measured in step 2. Modified CEMO step 4 (step 4’) solves the problem (1) synthesis of new optimal vector , where as the original CEMO, varied parameters are – length of microstrip all components of the filter and – gaps in segments of the associated lines [5], but and elements obtained in step 3’, correspond to the vector . As specified is used as output estimated characteristics – , calculated in step 1. The difference for the characteristic with the vector parameters is equal “Fig. 4”. As can be seen, the use of a mathematical model of the filter to adjust the parameters of the substrate according to the experimental data allowed us to align filter. The relative deviations of parameters from nominal substrate for filter components are small “Table. 2” and correspond to the tolerances on the characteristics of the material used. Thus, the proposed modification based on the correct mathematical model of the filter on its real parameters, greatly improves the efficiency of CEMO. IV. CONLUSION For example, the development of bandpass microstrip microwave fourth-order filter on the half-wave resonators with Chebyshev characteristic evaluated the effectiveness calculation-experimental optimization method, founded on an iterative process of correction of the results of the developed filter synthesis using his experimental output characteristics. It is shown that the method allows to take into account the influence of various factors relating to process of manufacturing and features of the materials used. However, additive methods of accounting in the target function of deviations from given output characteristics do not provide a rapid convergence process at large deviations. To reduce the unevenness of the reflection coefficient module of the filter for one iteration failed. Figure 4. Experimental -parameters of the filter with the parameters of (•, - -•- -) and (♦, - -♦- -), the obtained modified CEMO Further in accordance with steps 5 and 6 CEMO the manufacture of the new layout of the filter with parameter vector and to measure its output characteristics . Obtained in steps 3’, 4’ corrective and new vector of optimal parameters of filter are given in item c) and d) of the table. 1. Output characteristics of the layout of the filters with the parameters defined by the vectors and , is shown in Fig. 4. TABLE II. DEVIATIONS OF SUBSTRATE PARAMETERS FROM THE With the aim of increasing the effectiveness of calculation-experimental method of optimization is offered his modification, consisting in an additional adjustment of materials parameters and mathematical models of components of experimental output characteristics of the device. The proposed modification of the method can be used for a wide class of microstrip devices. Accounting irregularities of the relative dielectric permittivity and the thickness of the microstrip filter components in its design with the help of a modified method has allowed for a single iteration to reduce the unevenness of the reflection coefficient module of the filter with 7.1 dB to 1.7 dB. ACKNOWLEDGMENT Work is performed with financial support of Ministry of Education and Science of Russian Federation within the project part of state assignment in the field of scientific activity #3.1155.2014/K. REFERENCES NOMINAL № component of filter 1, 5 2, 4 3 6, 7 ,% -2.7 2.4 -1.2 7.3 ,% 11.4 -5.7 14.3 0 The difference between the outer lateral maximums characteristic filter with parameter vector is 5 . 98 [1] T. C. Edwards, M. B. Steer, “Foundations for Microstrip Circuit Design, Fourth Edition”, Hardcover - Wiley & Sons Ltd, 2016. [2] http://www.awrcorp.com/products/microwave-office. [3] B. M. Kac, V. P. Meshchanov, A. L. Feldshtein, “Optimalnyj sintez ustrojstv svch s T volnami./ Pod red. V. P. Meshchanov, M.:Radio i svyaz, 1984. 288 s. MIPRO 2016/MEET [4] A. M. Bogdanov, M. V. Davidovich, B. M. Kac i dr. “Sintez sverhshirokopolosnyh mikrovolnovyh struktur./ Pod red. A. P. Krenickogo i V. P. Meshchanova, M.: Radio i svyaz, 2005, 514 s. [5] M. I. Nikitina, “Sistema proektirovaniya mikropoloskovyh polosno-propuskayushchih filtrov”, dis. kand. tekh. nauk, Krasnoyarsk, Krasnoyarskij gos. tech.un-t, 1998, 151 s. [6] A. N. Savin, “Issledovanie ehlektrodinamicheskih harakteristik struktur vakuumnoj ehlektroniki i magnitoehlektroniki svch na MIPRO 2016/MEET osnove regressionnyh modelej”, dis. kand. fiz. – mat. nauk, Saratov, Saratovskij gos. un-t, 2003, 184 s. [7] I. A. Nakrap, A. N. Savin, and Yu. P. Sharaevskii, “Modeling of Wideband Slow-Wave Structures of Coupled-Cavities-Chain Type by the Impedance Design of Experiments”, Journal of Communications Technology and Electronics, 2006, Vol. 51, No. 3, pp. 316–323. 99 Minimax Design of Multiplierless Sharpened CIC Filters Based on Interval Analysis Goran Molnar1, Aljosa Dudarin1, and Mladen Vucic2 1 Ericsson Nikola Tesla d. d. Krapinska 45, 10000 Zagreb, Croatia 2 University of Zagreb, Faculty of Electrical Engineering and Computing Department of Electronic Systems and Information Processing Unska 3, 10000 Zagreb, Croatia E-mails: goran.molnar@ericsson.com, aljosa.dudarin@ericsson.com, mladen.vucic@fer.hr AbstractPolynomial sharpening of amplitude responses of cascaded-integrator-comb (CIC) filters is often used to improve folding-band attenuations. The design of sharpened CIC (SCIC) filters is based on searching the polynomial coefficients ensuring required magnitude response. Several design methods have been developed, resulting in real, integer, and sum-of-powers-of-two (SPT) coefficients. The latter are preferable since they result in multiplierless structures. In this paper, we present a method for the design of SCIC filters with SPT polynomial coefficients by using the minimax error criterion in folding bands. To obtain the coefficients, we use a global optimization technique based on the interval analysis. The features of the presented method are illustrated with the design of wideband SCIC filters.  I. INTRODUCTION Cascaded-integrator-comb (CIC) filter [1] is common building block in digital-down converters. However, in processing of wideband signals, the CIC filter is often incapable of meeting the requirement for high folding-band attenuations. To improve the CIC-filter folding-band response, various structures have been developed. An efficient structure arises from the polynomial sharpening of CIC response [2]. This structure implements so called sharpened CIC (SCIC) filter. The design of sharpened CIC filters is based on searching the polynomial coefficients ensuring required magnitude response. Several design methods have been developed, resulting in real, integer, and sum-of-powersof-two (SPT) coefficients. The latter are preferable since they result in multiplierless structures. Well-established sharpening method was developed by Kaiser and Hamming [3]. It gives integer coefficients using analytic expression. The incorporation of the KaiserHamming polynomial in SCIC filter was first presented in [4]. This structure is further improved in [5]−[8]. Recently, the Chebyshev polynomials have been used in the design of SCIC filters with very high folding-band attenuations [9]. Furthermore, closed-form methods for the design of This paper was supported in part by Ericsson Nikola Tesla d.d. and University of Zagreb, Faculty of Electrical Engineering and Computing under the project Improvement of Competences for LTE Radio Access Equipment (ILTERA), and in part by Croatian Science Foundation under the project Beyond Nyquist Limit, grant no. IP-2014-09-2625. 100 SCIC filters using the weighted least-squares [10] and minimax [11] error criterion in the passband and folding bands have been proposed. These methods provide SCIC filters ensuring small passband deviations and rather high folding-band attenuations. However, the coefficients obtained take real values, what results in the structure employing multipliers. In [12], the particle swarm optimization has been used to calculate SPT polynomial coefficients in order to achieve a given passband deviation. In [13], partially sharpened CIC filters have been developed, employing the polynomials in Bernstein's form [14], [15]. The proposed filters are multiplierless, but they only support power-of-two decimation factors. In [16] and [17], the sharpening has been combined with passband compensation. In this paper, we present a method for the design of multiplierless SCIC filters by using the minimax error criterion in folding bands only. The design is performed over the SPT coefficient space. To obtain the coefficients, we use a global optimization technique based on the interval analysis [18]. The presented method brings significant reduction in the filter complexity compared to the Chebyshev SCIC filters [9]. The paper is organized as follows. Section II briefly describes the sharpening of CIC filters. The method for the minimax design of multiplierless SCIC filters is presented in Section III. Section IV contains examples illustrating the features of the proposed filters. II. SHARPENED CIC FILTER The amplitude response of the CIC filter of the Nth order is given by [1]  R   1 sin  2     H CIC ( )   R     sin  2       N (1) where R denotes the decimation factor. The sharpening polynomial of the Mth order is given by MIPRO 2016/MEET Figure 1. Structure of sharpened CIC decimation filter. f ( x)  M  am x m (2) m 0 By substituting x = HCIC() into (2), we obtain the amplitude response of the sharpened CIC filter. It is given by M m ( ) H SCIC ( )  am H CIC m 0  M  m ( ) am H CIC (4) m 1 The structure of SCIC decimation filter is shown in Figure 1 [2]. It is clear from the figure that one multiplier is necessary for each non-zero polynomial coefficient. It makes the structure complex, especially for polynomials of high orders. However, by using the SPT coefficients, the multipliers are replaced by adders, resulting in an efficient structure. It should be noted that the structure is suitable if the delay elements introduce integer delays. It is achieved if N(R+1) is an even number. The structure can be further simplified such that elements with delays greater than R are split into two blocks by applying the noble identity. Consequently, one element operates at high, whereas the other one operates at low sampling rate. III. MINIMAX DESIGN OF MULTIPLIERLESS SHARPENED CIC FILTERS A. Minimax Approximation Two approaches to the design of sharpened CIC filters have been considered in literature. The first approach simultaneously sharpens the passband and the folding-band responses. The sharpening polynomial is obtained using the maximally flat [4], least squares [10], or minimax [11] approximation. The second approach includes sharpening only within the folding bands. Such an approach includes the design based on the Chebyshev polynomials [9]. To obtain high folding-band attenuations, the second approach is preferable. MIPRO 2016/MEET  (a)  max w( ) H SCIC ( , a)  H d ( ) (5)  where a is the vector of polynomial coefficients given by a  a1 a 2  a M  (3) It is well known that the amplitude response of a CIC filter has zeros placed at the central frequencies of the folding bands. This property is ensured if a0 = 0. Therefore, we deal with the amplitude response H SCIC ( )  Here, we perform the minimax design of SCIC filters within the folding bands. We start the design with weighted absolute error T (6) w() is a positive weighting function, and Hd() is the desired amplitude response defined within the band Ω. The error function in (5) should take into account only the folding bands. They are given by 2n  2 n  p    p   R R     p     ; n  1,  , R 1 2 (7) for an even R, and by  2 n 2n  p    p R R ; n  1,  , R 1 2 (8) for an odd R, where p is the edge of filter's passband. In folding bands, the desired response is zero. In addition, we assume the unity weighting function. By substituting Hd() = 0 and w() = 1 into error function in (5), we arrive at  (a)  max H SCIC ( , a)  (9) Note that function in (9) does not take into account the passband gain of the filter. Therefore, we normalized it to constant passband gain at one frequency. Here, we choose the unity gain at  = 0. The error function thus takes the form H SCIC ( , a)  H SCIC (0, a)  (a)  max (10) The expression for HSCIC(0,a) is easily obtained by substituting  = 0 into (4), resulting in H SCIC (0, a)  M  am (11) m 1 Finally, by substituting (11) into (10), we arrive at 101 IV. DESIGN EXAMPLES M m ( )  am H CIC  (a)  max m 1  M (12)  am m 1 In the design, we calculate (a) using the CIC response evaluated on uniformly spaced frequency grid Q = {k; k=0, ..., K1} defined within the folding bands, . Hence, the objective function is obtained as M m (k )  am H CIC  (a)  max m 1 k Q M (13)  am m 1 B. Problem Formulation Our goal is to find the optimum SPT polynomial coefficients of the sharpened CIC filter in the minimax sense. Such a design is described by the optimization problem aˆ  arg min  (a) a (14) subject to: a is SPT representable According to the real polynomial coefficients obtained with the methods in [9] and [11], it is reasonable to assume the SPT coefficients have integer and fractional parts. Therefore, we deal with am expressed as am  S 1  bm, k 2k k F ; m  1,, M (15) where bm,k{1,0,1}, and S and F are the wordlengths of the integer and fractional part. Each bm,k  0 is called term. Generally, it occupies one adder in implementation. Therefore, the number of terms per polynomial or the number of terms per coefficient is limited to a prescribed value, P. For simplicity, here we use the latter approach. From (13) it is clear that ε(a) = ε(−a). Apparently, two global minimizers with opposite signs exist. However, we need the minimizer that provides positive gain of the filter. It is achieved by adding the constraint HSCIC(0,a) > 0 to the objective function. It simplifies the search because the optimization deals with only a half of the overall SPT coefficient space. To obtain the optimum SPT coefficients, we employ the global optimization technique based on the interval analysis. Recently, such a technique has been used in the minimax design of symmetric non-recursive filters [18]. According to the paper referred to, solving the problem in (13)−(15) is based on the interval extension of the objective function. In our design of SCIC filter, the interval extension of the objective function in (13) can be easily obtained by using the extensions of elementary operations and functions. 102 To illustrate the features of the proposed multiplierless SCIC filters, two examples of wideband filters are described. The first example presents simple sharpened CIC filters, whereas the second example describes a filter offering very high folding-band attenuations. The optimum SPT polynomial coefficients of the SCIC filters are given in Table I, together with their maximum passband deviations and minimum folding-band attenuations. The coefficients are tabulated for various orders of sharpening polynomials, passband edge frequencies, and filter complexities, assuming R = 10, K = 900, and S+F = 20. It is well known that for a given order of the CIC filter, the amplitude response negligibly changes the shape within the passband and folding bands for R ≥ 10 [1]. In that sense, the tabulated SPT coefficients can be used for sharpening of any CIC filter with R ≥ 10. A. Simple SCIC Filters Here, we describe the multiplierless sharpening of the CIC response with N = 2 and R = 10, by using the polynomial with M = 3 and P = 1, and the passband edge frequencies given in Table I. Figure 2 shows the magnitude responses of the SCIC filters obtained for p = 0.02 and p = 0.05. The filter with p = 0.02 ensures the minimum folding-band attenuation of 81 dB, whereas the filter with p = 0.05 exhibits the minimum attenuation of 106 dB and, as expected, higher passband droop. Other simple SCIC filters have the passband and folding-band responses placed somewhere between the described responses. From Table I, it is clear that provided filters exhibit rather high folding-band attenuations. However, they have the structure in which three general-purpose multipliers are replaced by only one adder. In addition, they have similar responses within || ≤ 0.5/R, what makes them suitable for uniform passband compensation. B. SCIC Filter Offering High Alias Rejection In this example, we illustrate the design of multiplierless SCIC filters with high alias rejection. Such a filter can be obtained by the sharpening of the CIC response with N = 2 and R = 10, by using the polynomial with M = 4 and P = 2. The passband edge frequency p = 0.04 is chosen. The optimum multiplierless filter is compared with the filter obtained by the Chebyshev sharpening of the same CIC response, but with real coefficients of the sharpening polynomial of the same order [9]. Figure 3 shows the magnitude responses of both filters. Generally, filters with SPT coefficients contain a real gain constant, which is usually not implemented in practice. Therefore, only for comparison purposes, our response is normalized to 0 dB at  = 0. It is clear from Figure 3 that the responses are very similar. The Chebyshev response ensures the minimum folding-band attenuation of 141 dB, whereas the multiplierless filter exhibits somewhat smaller attenuation MIPRO 2016/MEET TABLE I SPT POLYNOMIAL COEFFICIENTS, PASSBAND DROOP (AP), AND MINIMUM FOLDING-BAND ATTENUATION (AS) OF VARIOUS MINIMAX SHARPENED CIC FILTERS WITH N = 2 AND R  10 M=3 am a1 a2 a3 AP, dB AS, dB p=0.2/R P=1 2–14 –2–6 20 0.86 132 P=2 –14 2 +2–19 –2–6–2–10 20–2–5 0.86 142 p=0.25/R P=1 2–12 –2–5 20 1.35 125 P=2 –12 2 –2–14 –2–5+2–11 20+2–3 1.35 129 p= /(3R) P=1 2–10 –2–4 20 2.43 106 P=2 –10 2 –2–12 –2–4–2–9 20+2–2 2.42 113 p=0.4/R P=1 2–11 –2–4 20 3.52 94.9 P=2 –9 2 +2–14 –2–3+2–7 21–2–1 3.54 102 p=0.5/R P=1 2–8 –2–3 20 5.69 81.0 P=2 2 –2–12 –2–3–2–8 20+2–7 5.70 89.3 –8 M=4 am a1 a2 a3 a4 AP, dB AS, dB p= /(3R) P=1 0 2–10 –2–4 20 3.23 144 P=2 –2–16+2–19 2–9–2–14 –2–4–2–6 20+2–8 3.24 150 p=0.4/R P=1 2–15 –2–14 –2–4 20 4.67 128 P=2 –2–15–2–17 2–8+2–14 –2–3+2–8 20+2–3 4.73 139 p=0.5/R P=1 2–13 2–9 –2–3 20 7.50 110 P=2 –2–12+2–18 2–6+2–10 –2–2–2–4 21–2–2 7.62 120 p=0.6/R P=1 –2–14 2–6 –2–2 20 11.4 96.4 P=2 –2–11–2–14 2–5–2–7 –2–2–2–5 20+2–6 11.5 105 p=2 /(3R) P=1 –2–12 2–6 –2–2 20 14.3 87.7 P=2 –2–9+2–14 2–4+2–10 –2–1–2–3 21–2–3 14.7 97.6 M=5 am a1 a2 a3 a4 a5 AP, dB AS, dB p=0.5/R P=1 –2–18 2–12 2–10 –2–3 20 9.32 139 P=2 2–18 –2–11+2–17 2–6+2–10 –2–2+2–5 20–2–6 9.52 150 p=0.6/R P=1 –2–16 2–13 2–6 –2–2 20 14.0 122 P=2 2–14 –2–8 –4 2 +2–6 –2–1–2–3 21–2–2 14.4 132 p=2 /(3R) P=1 –2–14 2–9 2–8 –2–2 20 17.7 109 of 139 dB. In the passband, the filters introduce nearly the same droop of 4.73 dB. From the complexity point of view, the Chebyshev SCIC filter needs five general-purpose multipliers to incorporate sharpening polynomial in the structure. However, our filter employs only six adders instead. V. CONCLUSION The design of multiplierless sharpened CIC filters based on minimax approximation in folding bands was presented. The design was formulated as an unconstrained optimization problem. The optimum polynomial coefficients are obtained by using the global optimization technique based on the interval analysis. The proposed filters exhibit similar amplitude behavior as the Chebyshev sharpened CIC filters. From the complexity point of view, the multiplierless filters are favorable, because they bring significant reduction in the structure. MIPRO 2016/MEET P=2 2 +2–18 –2–7+2–12 2–3+2–6 –20–2–6 21+2–1 18.3 123 –13 p=0.75/R P=1 2–12 2–11 –2–13 –2–2 20 22.9 93.6 P=2 2 –2–13 –2–6+2–10 2–2–2–4 –20+2–5 21–2–2 24.7 111 –11 p=0.8/R P=1 –2–10 2–8 2–3 –20 21 28.7 91.7 P=2 2 –2–17 –2–6–2–10 2–2–2–7 –20–2–2 21+2–3 29.2 102 –12 VI. REFERENCES [1] E. B. Hogenauer, “An economical class of digital filters for decimation and interpolation,” IEEE Trans. Acoust., Speech, Signal Process., vol. 29, no. 2, pp. 155–162, Apr. 1981. [2] T. Saramäki and T. Ritoniemi, “A modified comb filter structure for decimation,” in Proc. IEEE Int. Symp. Circuits Syst., vol. 4, Hong Kong, Jun. 1997, pp. 2353–2356. [3] J. Kaiser and R. Hamming, “Sharpening the response of a symmetric nonrecursive filter by multiple use of the same filter,” IEEE Trans. Acoust., Speech, Signal Process., vol. ASSP-25, no. 5, pp. 415–422, Oct. 1977. [4] A. Y. Kwentus, Z. Jiang, and A. N. Willson Jr., “Application of filter sharpening to cascaded integrator-comb decimation filters,” IEEE Trans. Signal Process., vol. 45, no. 2, pp. 457–467, Feb. 1997. [5] G. Stephen and R. W. Stewart, “High-speed sharpening of decimating CIC filter,” Electron. Lett., vol. 40, no. 21, pp. 1383–1384, Oct. 2004. [6] G. Jovanovic Dolecek and S. K. Mitra, “A new two-stage sharpened comb decimator,” IEEE Trans. Circuits Syst. I: Reg. Papers, vol. 52, no. 7, pp. 1414–1420, Jul. 2005. 103 Figure 2. Magnitude responses of sharpened CIC filters with N = 2 and R = 10, obtained by minimax SPT sharpening with M = 3 and P = 1, assuming passbands are ||  0.02 and ||  0.05. Figure 3. Magnitude responses of sharpened CIC filters with N = 2 and R = 10, obtained by minimax SPT sharpening with M = 4 and P = 2, and by Chebyshev sharpening of fourth order. Passband is ||  0.04. [7] Q. Liu and J. Gao, “Efficient comb decimation filter with sharpened magnitude response,” in Proc. 5th Int. Conf. WICOM, 2009, pp. 1–4. [8] H. Zaimin, H. Yonghui, K. Wang, J. Wu, J. Hou, and L. Ma, “A novel CIC decimation filter for GNSS receiver based on software defined radio,” in Proc. 7th Int. Conf. WICOM, 2011, pp. 1–4. [9] J. O. Coleman, “Chebyshev stopbands for CIC decimation filters and CIC-implemented array tapers in 1D and 2D,” IEEE Trans. Circuits Syst. I: Reg. Papers, vol. 59, no. 12, pp. 2956–2968, Dec. 2012. [10] G. Molnar, M. Glavinic Pecotic, and M. Vucic, “Weighted least-squares design of sharpened CIC filters,” in Proc. 36th Int. Conf. MIPRO, vol. MEET, Opatija, Croatia, May 2013, pp. 104–108. [11] G. Molnar and M. Vucic, “Weighted minimax design of sharpened CIC filters,” in Proc. IEEE 20th ICECS, Abu Dhabi, UAE, Dec. 2013, pp. 869–872. [12] M. Laddomada, D. E. Troncoso, and G. Jovanovic Dolecek, “Improved sharpening of comb-based decimation filters: Analysis and design,” in Proc. IEEE 11th Int. Conf. CCNC, 2014, pp. 11–16. [13] M. G. C. Jiménez, V. C. Reyes, and G. Jovanovic Dolecek, “Sharpening of non-recursive comb decimation structure,” in Proc. 13th ISCIT, 2013, pp. 458–463. [14] R. J. Hartnett and G. F. Boudreaux-Bartels, “Improved filter sharpening,” IEEE Trans. Signal Process., vol. 43, no.12, pp. 2805–2810, Dec. 1995. [15] S. Samadi, “Explicit formula for improved filter sharpening polynomial,” IEEE Trans. Signal Process., vol. 48, no. 10, pp. 2957–2959, Nov. 2000. [16] D. E. Troncoso Romero, M. Laddomada, and G. Jovanovic Dolecek, “Optimal sharpening of compensated comb decimation filters: analysis and design,” Sci. World J., Hindawi, vol. 2014, ID 950860, 9 pages. [17] M. G. C. Jiménez, D. E. Troncoso Romero and G. Jovanovic Dolecek, “On simple comb decimation structure based on Chebyshev sharpening,” in Proc. IEEE 6th LASCAS, Feb. 2015, pp. 1–4. [18] M. Vucic, G. Molnar, and T. Zgaljic, “Design of FIR filters based on interval analysis,” in Proc. 33rd Int. Conf. MIPRO, vol. MEET & GVS, Opatija, Croatia, May 2010, pp. 197–202. 104 MIPRO 2016/MEET Minimization of Maximum Electric Field in High-Voltage Parallel-Plate Capacitor Raul Blecic∗† , Quentin Diduck‡ , Adrijan Baric∗ ∗ University of Zagreb, Faculty of Electrical Engineering and Computing, Unska 3, 10000 Zagreb, Croatia Tel: +385 (0)1 6129547, fax: +385 (0)1 6129653, e-mail: raul.blecic@fer.hr † KU Leuven, ESAT-TELEMIC, Kasteelpark Arenberg 10, 3001 Leuven, Belgium ‡ Ballistic Devices Inc, 904 Madison St, Santa Clara, CA, 95050 United States Abstract—Minimization of maximum electric field of a parallel-plate capacitor for high-voltage and temperature stable applications is presented. Cubic zirconia is used as a dielectric material because of its high relative permittivity, high dielectric strength and high temperature stability. The maximum electric field present in the structure limits the maximum achievable capacitance of the capacitor structure. Reducing the maximum electric field of the capacitor allows to reduce the thickness of the dielectric material, which increases its capacitance. The impact of geometrical and electrical parameters of the parallel-plate capacitor on the maximum electric field is analyzed by a 2D multiphysics solver. The guidelines for the minimization of the maximum electric field are given. Index Terms—cubic zirconia, dielectric strength, edge effects, electrostatics, fringing fields. I. I NTRODUCTION High relative permittivity, high dielectric strength and high temperature stability of cubic zirconia [1], [2], [3] make it a candidate for a dielectric material for highvoltage, temperature stable capacitors. Most of the capacitance of a parallel-plate capacitor is contained in the central part of the structure. Yet, the electric field at the edges of the structure can be several times higher than the field in its center [4], which reduces the maximum achievable capacitance of the capacitor. Reducing the maximum electric field present in the capacitor allows to reduce the thickness of the dielectric material which increases the capacitance. The objective of this paper is to analyze the impact of geometrical and electrical parameters of a parallelplate capacitor on the maximum electric field in order to minimize the maximum electric field present in the capacitor and to consequently maximize its capacitance. The analysis is performed by a 2D multiphysics solver. This paper is structured as follows. Section II introduces a parallel-plate capacitor. Section III presents a numerical analysis of the impact of electrical and geometrical parameters of the structure on the maximum electric field. Section IV concludes the paper. II. C APACITANCE OF A PARALLEL -P LATE C APACITOR Capacitance of a parallel-plate capacitor, neglecting the contribution of the fringing fields, can be expressed as: Cpp =  · MIPRO 2016/MEET S , d (1) where  is the permittivity of the material between the plates, S is the area of the plates and d is the separation between the plates. The total capacitance of a parallel-plate capacitor, taking the fringing fields into account (assuming that the plates are completely surrounded by the same material), can be expressed by the Palmer’s equation [5]: Ctotal Ka Kb S (2) =  · · Ka · Kb ,     d 2·π·a d 1 + ln , (3) = 1+ π·a d     d 2·π·b = 1+ 1 + ln , (4) π·b d where  is the permittivity of the material which surrounds the plates, while a and b are the width and the length of the plates (S = a · b). The relative contribution of the fringing fields to the total capacitance of a square parallel-plate capacitor calculated as Cf f = Ctotal − Cpp is shown in Fig. 1. The contribution of the fringing field capacitance to the total capacitance of the parallel-plate capacitor is negligible if the size of the plates is large relative to the separation of the plates (a >> d and b >> d), which is typically true for high-capacitance capacitors. Although the contribution of the fringing fields to the capacitance of a parallel-plate capacitor is negligible for large sizes of plates, the fringing fields can be several times larger than the field in the center of the capacitor [4]. The fringing fields limit the minimum separation of the plates and the maximum achievable capacitance of the capacitor. Reducing the maximum electric field enables the reduction of the separation between the plates, which increases the capacitance. III. 2D S IMULATIONS A. Simulation Domain The simulation domain of a parallel-plate capacitor in a 2D multiphysics solver is shown in Fig. 2. It consists of two 0.1-mm thick copper plates. Material between the copper plates is a 0.15-mm thick cubic zirconia which has a high relative permittivity (r,CZ = 17), high dielectric strength and high temperature stability. The capacitor is enclosed by a surrounding material. The surrounding material has to be able to sustain large electric fields 105 Ctotal [nF] 12 8 TABLE I PARAMETERS OF THE S IMULATED S TRUCTURE . d = 0.15 mm d = 0.3 mm d = 1.5 mm 4 0 0 25 50 75 100 Width of square copper plates, a = b [mm] (a) Cf f [nF] 0.12 0.08 d = 0.15 mm d = 0.3 mm d = 1.5 mm Parameter Value Width of the copper plates, wCu [mm] Width of the surrounding material, wsm [mm] Thickness of the cubic zirconia, tCZ [mm] Thickness of the copper plates, tCu [mm] Relative permittivity of the cubic zirconia, r,CZ Losses of the cubic zirconia, tan δCZ Losses of the surrounding material, tan δsm Conductivity of copper, σCu [S/m] 1 5 0.15 0.1 17 0 0 6e7 TABLE II VARIABLES OF THE S IMULATED S TRUCTURE . 0.04 0 0 25 50 75 100 Width of square copper plates, a = b [mm] Variable Values Width of the cubic zirconia, wCZ [mm] Relative permittivity of the surrounding material, r,sm 0.8–1.6 1, 4, 20, 100 (b) Cf f /Ctotal [%] 100 d = 0.15 mm d = 0.3 mm d = 1.5 mm 75 50 25 0 0 25 50 75 100 Width of square copper plates, a = b [mm] (c) Fig. 1. Capacitance of a square parallel-plate capacitor as a function of the size of the plates: (a) total capacitance, (b) fringing-field capacitance (Cf f = Ctotal − Cpp ), and (c) relative contribution of the fringing fields (Cf f = Ctotal − Cpp ) to the total capacitance. C. Impact of the Width of Cubic Zirconia and the Dielectric Constant of the Surrounding Material Surrounding material wCu wsm Cu tCu tCZ CZ tCu Cu wCZ wsm Fig. 2. Simulation domain in a 2D multiphysics solver. present at the edges of the copper plates (has to have a high dielectric strength), however, its relative permittivity and temperature stability do not significantly contribute to the overall capacitance and temperature stability of the capacitor. B. Methodology Stationary electrostatic analysis is performed. Boundary condition is set to perfect insulator (modelled as the zero 106 charge boundary condition [6]) and it is placed sufficiently far away from the parallel-plate capacitor to have a negligible impact on the simulation results. The top copper plate is set to 1 V, while the bottom copper plate is set to 0 V. This gives the electric field in the center of the structure equal to Ecenter = (1 V)/tCZ = 6.67 kV/m. The parameters and variables of the simulated structure are given in Table I and Table II, respectively. The width of the copper plates is set to a small value (wCu = 1 mm) to reduce the simulation domain. In actual application, it is much larger to provide a sufficiently large capacitance. Parametric simulations for the different widths of cubic zirconia wCZ and for four values of the dielectric constant of the surrounding material r,sm is performed. The maximum electric fields in the cubic zirconia and in the surrounding material are shown in Fig. 3. The results can be divided into three regions, the first region in which the width of the cubic zirconia is smaller than the width of the copper plates (wCZ < 0.95 mm), the second region in which the width of the cubic zirconia is larger than the width of the copper plates (wCZ > 1.1 mm), and the third region in which the two sizes are comparable (0.95 < wCZ < 1.1 mm). The results in the first region show that the maximum electric field in the cubic zirconia decreases as its width decreases. Also, it shows a weak dependence on the relative permittivity of the surrounding material. The maximum field in the surrounding material does not depend on its relative permittivity nor on the width of the cubic zirconia. In the second region, the maximum electric fields in the cubic zirconia and in the surrounding material depend on the relative permittivity of the surrounding material. Increasing it decreases the maximum electric field in both materials. In the third region, the maximum electric fields in MIPRO 2016/MEET max(ECZ ) [kV/m] 120 Surrounding material (r,sm = 100) ǫr,sm =1 ǫr,sm =4 ǫr,sm =20 ǫr,sm =100 90 Cu CZ (wCZ = 0.8 mm) Cu 60 30 0 0.8 1 1.2 1.4 Width of cubic zirconia, wCZ [mm] 1.6 (a) Surrounding material (r,sm = 100) max(Esm ) [kV/m] (a) 120 ǫr,sm =1 ǫr,sm =4 ǫr,sm =20 ǫr,sm =100 Cu CZ (wCZ = 1.6 mm) Cu 90 60 30 0 0.8 (b) 1 1.2 1.4 Width of cubic zirconia, wCZ [mm] 1.6 Surrounding material (r,sm = 1) (b) Cu CZ (wCZ = 1.6 mm) Cu Fig. 3. Maximum electric field as a function of the width of the cubic zirconia and the relative permittivity of the surrounding material: (a) in the cubic zirconia and (b) in the surrounding material. both materials depend on the relative permittivity of the surrounding material. Increasing it increases the maximum electric field in both materials. For relative permittivity of the surrounding material larger than that of the cubic zirconia r,sm > r,CZ , the peak maximum electric field in both materials is obtained for wCZ = wCu . Comparing all three regions, the first region shows the smallest maximum electric field in the cubic zirconia. The second region, for the relative permittivity of the surrounding material larger than that of the cubic zirconia r,sm > r,CZ , shows the smallest maximum field in the complete structure (in the dielectric material and in the surrounding material). Also, the impact of the surrounding material on the temperature stability of the capacitor is expected to be the smallest in the second region since the surrounding material is not present between the plates. Therefore, the second region (for relative permittivity of the surrounding material larger than that of the cubic zirconia r,sm > r,CZ ) is considered to be the best solution for the temperature stable applications. The third region does not provide any significant advantage over the first and the second region. The electric field distributions in the simulated structure for three characteristic cases (wCZ = 0.8 mm, r,sm = 100; wCZ = 1.6 mm, r,sm = 100 and wCZ = 1.6 mm, r,sm = 1) are shown in Fig. 4. The maximum electric fields in the cubic zirconia and in the surrounding material for wCZ = 0.8 mm and for wCZ = 1.6 mm are given in Table III. D. Impact of the Shape of the Copper Plates The impact of the shape of the copper plates on the maximum electric field is analyzed as follows. Four structures are analyzed and compared. The first one is MIPRO 2016/MEET (c) Fig. 4. Electric field in the simulated structure for: (a) wCZ = 0.8 mm, r,sm = 100, (b) wCZ = 1.6 mm, r,sm = 100 and (c) wCZ = 1.6 mm, r,sm = 1. The maximum electric field can be observed at the edges of the copper plates. TABLE III M AXIMUM E LECTRIC F IELD IN THE C UBIC Z IRCONIA AND IN THE S URROUNDING M ATERIAL FOR THE W IDTH OF THE C UBIC Z IRCONIA wCZ = 0.8 MM AND wCZ = 1.6 MM , AND FOR THE R ELATIVE P ERMITTIVITY OF THE S URROUNDING M ATERIAL r,sm = 1, 4, 20, 100. wCZ [mm] max(ECZ ) [kV/m] max(Esm ) [kV/m] 1 0.8 1.6 6.67 98.42 38.24 92.41 4 0.8 1.6 6.67 75.79 38.24 69.31 20 0.8 1.6 6.70 35.21 38.24 28.17 100 0.8 1.6 6.72 15.43 38.24 11.48 r,sm the structure with rectangular edges as shown in Fig. 2. The second structure has triangular edges (cut at 45◦ ) (Fig. 5(a)), the third has round edges (with the radius r = tCu /2 = 0.05 mm) (Fig. 5(b)), while the fourth has Rogowski profile edges (ψ = π/2) terminated with a circular section [7] (with the radius r = 0.031 mm) (Fig. 5(c)). The parameters of the simulated structures are given in Table I. The width of the cubic zirconia is set to wCZ = 1.6 mm. The maximum electric fields in the cubic zirconia and in 107 Surrounding material wCu wsm Cu tCu tCZ CZ tCu Cu TABLE IV M AXIMUM E LECTRIC F IELD IN THE C UBIC Z IRCONIA AND IN THE S URROUNDING M ATERIAL FOR THE R ECTANGULAR , T RIANGULAR AND ROUND E DGES OF THE C OPPER P LATES , AND FOR THE R ELATIVE P ERMITTIVITY OF THE S URROUNDING M ATERIAL r,sm = 1, 4, 20, 100 (wCZ = 1.6 MM ). r,sm shape of edges max(ECZ ) [kV/m] max(Esm ) [kV/m] 1 rectangular triangular round Rogowski profile 98.42 91.67 20.07 7.37 92.41 115.72 332.40 122.08 4 rectangular triangular round Rogowski profile 75.79 54.56 12.25 6.81 69.31 64.04 51.03 29.03 20 rectangular triangular round Rogowski profile 35.21 19.26 8.54 6.67 28.17 13.63 7.65 9.04 100 rectangular triangular round Rogowski profile 15.43 9.34 7.45 6.72 11.48 7.08 3.17 4.63 wCZ wsm (a) Surrounding material wCu wsm Cu tCu tCZ CZ tCu Cu wCZ wsm (b) Surrounding material wCu wsm Cu tCu tCZ CZ tCu Cu wCZ wsm (c) Fig. 5. The shapes of copper plates of the parallel-plate capacitor: (a) triangular edges, (b) round edges (r = tCu /2 = 0.05 mm) and (c) Rogowski profile edges (ψ = π/2) terminated with a circular section (r = 0.031 mm). the surrounding material for the four simulated structures are given in Table IV. The results show that the Rogowski profile edges provide the smallest maximum electric field in the cubic zirconia among the four simulated structures for any value of the relative permittivity of the surrounding material. The maximum electric field in the surrounding material for the structure with the triangular, round or Rogowski profile edges is significantly reduced only if the relative permittivity of the surrounding material is larger 108 than that of the cubic zirconia (r,sm > r,CZ ). Although the structure with the Rogowski profile edges shows the smallest maximum electric field, the design of such structure with solid dielectric materials presents an additional challenge from a mechanical point of view. The maximum electric field in the cubic zirconia for the triangular edges and for the relative permittivity of the surrounding material equal to r,sm = 100 is 40% larger than the electric field in the center of the structure, while for the round edges and for the relative permittivity of the surrounding material equal to r,sm = 100 is only 12% larger than the electric field in the center of the structure which provides a significant improvement when compared to the structure with the rectangular edges. IV. C ONCLUSION The impact of geometrical and electrical parameters of the parallel-plate capacitor, with the cubic zirconia used as a dielectric material, on the maximum electric field present in the capacitor structure is analyzed by a 2D multiphysics solver. The maximum electric field present in the capacitor can be significantly reduced if the width of the dielectric is designed to be larger than the width of the copper plates and if the capacitor is enclosed by a material with a large relative permittivity, i.e. larger than the relative permittivity of the dielectric material. The maximum electric field can be further reduced by patterning the edges of the copper plates in a triangular, round or Rogowski profile shape. R EFERENCES [1] C.-H. Chang, B. Hsia, J. P. Alper, S. Wang, L. E. Luna, C. Carraro, S.-Y. Lu, and R. Maboudian, “High-temperature All Solid-state Microsupercapacitors Based on SiC Nanowires Electrode and YSZ Electrolyte,” ACS Appl. Mater. Interfaces, vol. 7, pp. 26 658–26 665, 2015. MIPRO 2016/MEET [2] O. Jongprateep, V. Petrosky, and F. Dogan, “Yttria Stabilized Zirconia as a Candidate for High Energy Density Capacitor,” Kasetsart Journal, vol. 42, pp. 373–377, 2008. [3] ——, “Effects of Yttria Concentration and Microstructure on Electric Breakdown of Yttria Stabilized Zirconia,” Journal of Metal, Materials and Minerals, vol. 18, no. 1, pp. 9–14, 2008. [4] K. P. P. Pillai, “Fringing field of finite parallel-plate capacitors,” Proceedings of the Institution of Electrical Engineers, vol. 117, no. 6, pp. 1201–1204, June 1970. MIPRO 2016/MEET [5] M. Hosseini, G. Zhu, and Y.-A. Peter, “A new formulation of fringing capacitance and its application to the control of parallelplate electrostatic micro actuators,” in Analog Integr. Circ. Sig. Process, 2007, pp. 119–128. [6] COMSOL, AC/DC Module User’s Guide. COMSOL, Nov. 2013. [7] N. G. Trinh, “Electrode Design for Testing in Uniform Field Gaps,” IEEE Transactions on Power Apparatus and Systems, vol. PAS-99, no. 3, pp. 1235–1242, May 1980. 109 Modelling SMD Capacitors by Measurements Roko Mišlov∗ , Marko Magerl∗ , Sandra Fratte-Sumper† , Bernhard Weiss† , Christian Stockreiter† and Adrijan Barić∗ ∗ University of Zagreb Faculty of Electrical Engineering and Computing, Unska 3, 10000 Zagreb, Croatia Email: roko.mislov@fer.hr † ams AG, Tobelbader Strasse 30, Premstaetten 8141, Austria Email: christian.stockreiter@ams.com Abstract—The lumped models of several capacitors in SMD-0603 package are extracted from S-parameter measurements. The S-parameters of the SMD components are obtained: (i) by modelling the RF connector and transmission lines and de-embedding, and (ii) by directly calibrating the reference plane to the SMD component using on-board calibration standards. The extracted parasitics related to the SMD component package are compared for capacitors of different nominal values and for the two calibration methods in the frequency range up to 8 GHz. Index Terms—SMD capacitor modelling, SMA connector modelling, CPW-CB modelling, S-parameter measurements (a) Test structure (i): SMA connectors connected by CPWG lines. (b) Test structure (ii): SMA connectors, CPWG lines and the DUT. I. I NTRODUCTION Surface mount (SMD) capacitors are used in many electronics applications, e.g. decoupling in power delivery networks (PDN) [1], injection of the RF disturbance in electromagnetic immunity measurements, such as direct power injection (DPI) [2], or in the output filters for DC/DC converters [3]. A high-frequency capacitor model available in the design phase can significantly improve the performance of the first-pass design. Several modelling methods for passive SMD components are reported in literature [4], [5], [6], where the lumped or lumped-distributed models are built using S-parameters of the SMD components. The S-parameters are measured using a vector network analyzer (VNA), primarily because of its high dynamic range and broad frequency range. Alternatively, the S-parameters can be obtained by applying the inverse Fourier transform on the time-domain reflectometry (TDR) measurements [7]. The reference plane of the VNA measurements is calibrated to the device-under-test (DUT) by measuring well-defined short, open, load and through (SOLT) calibration standards. The VNA calibration kit enables calibration up to the end of the coaxial cables. In [4] on-board calibration standards are used to calibrate the reference up to the DUT on the PCB. The on-board open calibration standard is characterized using VNA port extension, based on the assumption that the remaining on-board calibration standards are well-defined. An alternative calibration approach is to model the fixture between the coaxial cables and the DUT: the radio-frequency (RF) connectors and the transmission lines. The model can be used to de-embed the fixture from the measurements obtained by calibrating the reference 110 (c) Test structure (ii): SMA connectors, CPWG lines and the OPEN standard. Fig. 1: Manufactured test structures. plane using the well-defined calibration standards in the VNA calibration kit. In this paper, both of these calibration methods are used and compared. Instead of using port extension, the on-board calibration standards are characterized by de-embedding their S-parameters using the fixture model. Additional test structures are manufactured and measured, in order to characterize the substrate, the RF connectors and the transmission lines. The transmission lines are designed as conductor-backed coplanar waveguide (CPWG) according to the design guidelines given in [5]. Three capacitors of different nominal values, but in identical SMD-0603 package are modelled. This paper is organised as follows. The measurement setup is presented in Section II, and the measurement results are shown in Section III. The modelling methodology and results are presented in Section IV, and discussed in Section V. The paper is concluded in Section VI. II. M EASUREMENT SETUP A. Test structures Two types of test structures are manufactured on printed circuit boards (PCB): (i) structures for FR4 substrate char- MIPRO 2016/MEET TABLE I: The nominal substrate values and the 50 Ω CPWG line geometry calculated using [11]. parameter εr tan δ substrate height [mm] Cu thickness [µm] trace width [mm] clearence to GND [mm] nominal value 4.5 0.02 1.5 35 1.548 0.3 TABLE II: Substrate permittivity calculated using the phase difference of CPWG lines: l1 =25 mm, l2 =50 mm, l3 =100 mm. used lines εef f,avg εr 2.973 3.017 3.039 4.848 4.935 4.980 l1 , l2 l1 , l3 l2 , l3 TABLE III: Optimized values of the SMA connector and transmission line models. parameter value C1 [pF], L1 [nH] C2 [pF], L2 [nH] C3 [pF], L3 [nH] lCP W G [mm] tan δ 0.0044, 0.0326 0.2798, 0.0010 0.1573, 0.0466 23.65 0.0234 Fig. 2: The effective permittivity of the FR4 substrate extracted from measurements of test structures (i). Fig. 3: The SMA connector model. acterization and RF connector modelling, and (ii) structures for measuring the SMD capacitors and on-board calibration standards. Test structures (i), shown in Fig. 1a, consist of two SubMiniature version A (SMA) connectors [8] connected by CPWG transmission lines of several different lengths. The vias connecting the ground planes are placed along the trace, as recommended in [5]. Test structures (ii), shown in Figs. 1b, 1c, consist of pads for 2-port measurements of SMD-0603 components connected to SMA connectors by CPWG transmission lines. In Fig. 1b this test structure is used to characterize the DUT as a 2-port component, while in Fig. 1c the same test structure is used to measure the on-board open calibration standard. The on-board open consists of two parallel 1 pF capacitors soldered between each DUT pad and ground. All components of the on-board calibration standards have identical SMD-0603 package as the modelled SMD capacitors. The CPWG transmission lines are designed to have the characteristic impedance of 50 Ω on the FR4 substrate. The transmission line geometric parameters, as well as the nominal substrate parameters used for their calculation, are summarized in Table I. The S-parameters are measured in the frequency range from 300 kHz to 8 GHz using a 2-port VNA [9]. Each measurement is taken in 1600 frequency points in the logarithmic scale using a 500 Hz resolution bandwidth and an averaging factor of 3. The modelling is done using Advanced Design System 2015.01 (ADS) [10]. B. Substrate characterization The effective permittivity of the FR4 substrate is calculated using the S-parameter measurements of the test structures (i) described in Section II-A. The following MIPRO 2016/MEET transmission line lengths are manufactured: l1 = 25 mm, l2 = 50 mm, l3 = 100 mm. The phase difference between the S21 parameters of any two of these test structures is related to the difference of the physical line lengths by the equation: √ 2πf εef f 2π ∆ϕ = β∆l = ∆l, (1) ∆l = λf c where ∆l is the difference in physical line length of the two test structures, β is the wave number, ∆ϕ is the phase difference of the S21 parameters at each frequency f , and εef f is the effective permittivity. This relation enables the calculation of the effective permittivity as a function of frequency:  εef f (f ) = c∆ϕ 2πf ∆l 2 , (2) and it is shown in Fig. 2. The average value of the substrate permittivity over the frequency range from 1 GHz to 8 GHz is given in Table II for all three pairs of transmission line measurements. The value of the substrate permittivity εr is calculated from the average effective permittivity εef f,avg using the CPWG transmission line model [12] and the frequency independent dielectric loss model of the substrate in ADS. C. SMA connector modelling The topology of the SMA connector model is a simplified version of the lumped-distributed connector model presented in [13]. The model is shown Fig. 3. The components L1 and C1 represent the transition between the coaxial cable and the SMA connector. The transmission 111 Measured Modelled -60 -80 -100 3x105 106 107 108 Freq [Hz] (a) S11 109 200 100 -1 100 -2 0 0 -100 -3 -200 8x109 -4 3x105 106 -100 Measured Modelled 107 Phase [deg] -40 0 Magnitude [dB] Magnitude [dB] -20 200 Phase [deg] 0 108 Freq [Hz] (b) S21 109 -200 8x109 Fig. 4: Comparison between measurements and model of test structure (i) with the transmission line length l1 = 25 mm. 0 0 100 -40 -50 SOLT calibration on-board calibration -60 3x105 106 107 108 Freq [Hz] (a) S11 109 100 50 0 -50 -100 8x109 50 -20 0 SOLT calibration on-board calibration -30 -40 3x105 106 107 108 Freq [Hz] (b) S21 109 -50 -100 8x109 Fig. 5: Comparison of the de-embedded S-parameters of the 68 pF capacitor for the SOLT and the on-board calibration. line Z1 models the coaxial section of the connector. The right-angle transition between the SMA connector and the CPWG line on the PCB is represented by the components C2 and L2 , and the part of the transmission line that is directly below the connector is modelled by the components C3 and L3 . The values of the model parameters are obtained by minimizing the absolute difference between the measured and modelled S11 and S21 parameters of the test structures (i), described in Section II-A, using the random and gradient optimization solvers in ADS. The initial values of the optimized variables are chosen according to [13]. The optimization takes several minutes on an Intel R Core R i7 CPU @ 3.0 GHz and 12 GB of RAM. The transmission lines l1 , l2 , l3 are modelled by optimizing the CPWG model [12]. The lengths of these transmission lines are defined by a single variable lCP W G , such that the modelled line lengths are equal to lCP W , (lCP W G + 25 mm), (lCP W G + 75 mm). In this way the coupling effects between the SMA connector and the CPWG lines are taken into account consistently. The dielectric loss is modelled by optimizing the loss tangent of the substrate tan δ. The variables that are not optimized are the nominal impedance Z1 = 50 Ω of the SMA connector, the previously obtained substrate permittivity εr = 4.848, the copper conductivity σCu = 58.5·106 S/m and copper 112 thickness tCu = 35 µm. The random optimizer is used, followed by the gradient optimizer, with 100 iterations each. The final values of the optimized variables of the SMA connector model are given in Table III. The comparison between the measurements and the model of the test structures (i) with the line length l1 = 25 mm is given in Fig. 4. III. M EASUREMENT RESULTS The SMA connector and the CPWG line on each side of the DUT in the test structures (ii) is referred to as the fixture. In order to de-embed the fixture from the S-parameter measurements, the model of the fixture is built using the extracted characteristics of the substrate and the model of the SMA connector. The fixture return loss is better than 10 dB in the frequency range up to 4.9 GHz (the worst case return loss is 6.4 dB). The fixture insertion loss is less than 1.8 dB in the entire frequency range up to 8 GHz. The fixture model enables de-embedding of the fixture from the S-parameter measurements obtained by the VNA calibration kit, i.e. SOLT calibration up to the end of the coaxial cables. The on-board calibration standards, i.e. the custom calibration kit, are soldered on the PCB pads where the component to be characterized will be soldered later. Then, each on-board standard is measured separately according MIPRO 2016/MEET Phase [deg] -30 -10 Magnitude [dB] Magnitude [dB] -20 Phase [deg] -10 to the SOLT procedure and stored into the VNA memory. The short and through standards are implemented using 0 Ω resistors; the load standard are two 100 Ω resistors in parallel, and the open standard are two 1 pF capacitors in parallel. These SMD components have the same 0603 package as the characterized capacitors. In the remainder of this paper, the measurements obtained using the custom calibration kit are referred to as “on-board calibration”, while the DUT measurements obtained by de-embedding the fixtures are referred to as “SOLT calibration”. The comparison between these two measurement methods is shown in Fig. 5, for the 68 pF capacitor in the SMD-0603 package. IV. M ODELLING RESULTS The models of the SMD capacitors are extracted using the π-model shown in Fig. 6. The admittances Y1 , Y2 represent the parasitic coupling between the soldering pads and ground, while the admittance Y3 represents the intrinsic SMD component. These admittances can be expressed in terms of the y-parameters using: Y3 = −(y12 + y21 )/2, Y1 = y11 − Y3 , Y2 = y22 − Y3 . The y-parameters are obtained from the de-embedded S-parameters using well-known relations [14]. The admittances Y1 and Y2 are modelled first. The results show that these admittances are similar. In order to simplify the model, they are modelled as identical components according to the mean value Y1,avg = (Y1 + Y2 )/2. The admittance Y1,avg is shown in Fig. 7a for the 68 pF capacitor measured using SOLT calibration. The capacitive character of this admittance can be observed. The value of Y1,avg admittance is similar for on-board calibration in magnitude and phase. Both Y1 and Y2 are modelled as the capacitances CP . The value of the capacitance can be determined from the imaginary part of Y1,avg by using expression: Im{Y1,avg } Im{Y1,avg } = . (3) ω 2πf The measured value of CP obtained using Eq. (3) is frequency-dependent, as it is shown in Fig. 8. In order to obtain an optimal value of this parameter in the model, the value of CP is optimized to minimize the mean square error with respect to the measured value in the frequency range where the frequency dependence of CP is approximately flat: from 10 MHz to 8 GHz. The admittance Y3 is shown in Fig. 7b. The series resonance behaviour can be observed. Therefore, Y3 is modelled using the series RLC model. Its impedance is equal to: 1 1 Z3 = = RS + j(ωLS − ), (4) Y3 ωCN where CN is the nominal capacitance, LS is the series inductance, and RS is the equivalent series resistance (ESR). The value of the nominal capacitance is calculated by neglecting the inductive component ωLS in Eq. (4) for low frequencies: 1 1 Im{Z3 } ≈ − =− (5) ωCN 2πf CN CP = MIPRO 2016/MEET Fig. 6: π-model for SMD components extracted using y-parameters. The nominal capacitance CN can be expressed as: CN = −1 , 2πf Im{Z3 } (6) The value of CN in the model is optimized in the frequency range from 1 MHz to 10 MHz. Analogously, the value of the series inductance LS is calculated by neglecting the capacitive component 1/ωCN in Eq. (4) for high frequencies: Im{Z3 } ≈ ωLS = 2πf LS . (7) The value of LS is equal to: LS = Im{Z3 } . 2πf (8) The value of LS in the model is optimized in the frequency range from 1 GHz to 8 GHz. The value of the modelled series resistance RS is optimized with respect to the real part of Z3 in Eq. (4). The resistance value is optimised in the frequency range from 10 MHz to 100 MHz. All optimizations are run in ADS using the random optimizer with 100 iterations. The frequency range for the optimization is chosen manually for each element of the model, and for each modelled capacitor. The final schematic of the SMD capacitor model is shown in Fig. 9. The optimized values of the model elements are summarized in Table IV for the different capacitor values, as well as for both calibration methods. The comparison between the values of the model elements in Fig. 9 obtained by measurements and the optimized values is shown in Fig. 10, and the comparison of the S-parameters is shown in Fig. 11. V. D ISCUSSION The measured S-parameters of the test structures can be successfully used to build models of the SMA connector, the CPWG transmission lines and the SMD capacitors. The S-parameters obtained using the different calibration methods (SOLT and on-board), shown in Fig. 5, are consistent for frequencies up to 500 MHz. In the higher frequency range, the general trend of the S-parameters is consistent. The maximum model error in the magnitude of S21 is under 1 dB. The model topology presented in Section IV is built based on the y-parameters obtained from measurements. It can be considered a physical model, since each element 113 6 90 4 Magnitude Phase 2 106 107 Freq [Hz] 108 85 80 109 Magnitude [dBS] 95 8 Phase [deg] Magnitude [mS] 10 0 105 20 100 90 0 60 30 -20 0 -40 -30 -60 -60 Magnitude Phase -80 3x105 106 10 7 (a) Y1,avg = (Y1 + Y2 )/2. Phase [deg] 12 -90 8 10 Freq [Hz] 10 9 8x109 (b) Y3 . Fig. 7: Values of the admittances in the π-model obtained from the de-embedded S-parameters measured using SOLT calibration for 68 pF capacitor. 10-12 Capacitance [F] on-board calibration SOLT calibration 10-13 5 3x10 106 Fig. 9: Model of the SMD capacitors. 107 108 Freq [Hz] 109 8x109 Fig. 8: Extracted value of the capacitance CP in Eq. (3) for the 1 nF capacitor obtained using both calibration methods. has a physical interpretation. The capacitors CP model the parasitic capacitive coupling between the soldering pads and ground planes. The values for all capacitors and calibration methods are around 200 fF. This is in the expected order of magnitude for the SMD-0603 package. The capacitance to ground CP is similar for both calibrations with the biggest difference between SOLT and on-board calibrations of 20% for the 10 nF capacitor. The element LS represents the equivalent series inductance and its value is consistent and is between 1.1 nH and 1.6 nH. The element RS represents the equivalent series resistance, and its value is between 0.1 Ω and 0.3 Ω. VI. C ONCLUSION The SMD capacitor models are extracted from the measured S-parameters. The fixture consisting of SMA connectors and CPWG transmission lines is de-embedded by, firstly, standard VNA calibration enhanced by fixture modelling and, secondly, by on-board calibration standards. The two calibration methods are compared. The on-board calibration is consistent with the de-embeded results for frequencies up to 8 GHz. The parasitic element values of the modelled capacitors are similar for different capacitors in the same SMD-0603 package. The extracted models can be used to improve the design time for electronic circuits in high frequency applications. 114 TABLE IV: Model components value for SOLT and on-board calibration. nominal C C = 68 pF C = 1 nF C = 10 nF calibration SOLT on-board SOLT on-board SOLT on-board CP /fF LS /nH RS /mΩ CN /pF fY3 res /MHz 283.2 1.144 135.9 68.20 517.2 273.5 1.155 117.9 68.21 517.2 247.1 1.427 157.5 995.6 133.4 202.9 1.417 149.2 993.3 134.2 195.1 1.615 126.1 9187.4 40.65 221.4 1.617 127.4 9181.2 41.12 ACKNOWLEDGMENT This research is funded by ams AG, Premstaetten, Austria. R EFERENCES [1] P. Muthana, A. E. Engin, M. Swaminathan, R. Tummala, V. Sundaram, B. Wiedenman, D. Amey, K. H. Dietz, and S. Banerji, “Design, Modeling, and Characterization of Embedded Capacitor Networks for Core Decoupling in the Package,” IEEE Trans. Adv. Packag., vol. 30, no. 4, pp. 809–822, Nov 2007. [2] A. Alaeldine, R. Perdriau, M. Ramdani, J. Levant, and M. Drissi, “A Direct Power Injection Model for Immunity Prediction in Integrated Circuits,” IEEE Trans. Electromagn. Compat., vol. 50, no. 1, pp. 52–62, Feb 2008. [3] Infineon, PFC boost converter design guide, Appl. Note, February 2016. [4] K. Naishadham, “Experimental Equivalent-Circuit Modeling of SMD Inductors for Printed Circuit Applications,” IEEE Trans. Electromagn. Compat., vol. 43, no. 4, pp. 557–565, Nov 2001. [5] B. Pejcinovic, V. Ceperic, and A. Baric, “Design and Use of FR-4 CBCPW Lines In Test Fixtures for SMD Components,” in ICECS, Dec 2007, pp. 375–378. [6] P. R. B. Vitor, M. J. Rosario, and J. C. Freire, “Modelling SMD Capacitor for Microstrip Circuits,” in MELECON, Apr 1989, pp. 717–720. [7] K. Technologies, Using the Time-Domain Reflectometer, Appl. Note, August 2014. MIPRO 2016/MEET 100 Capacitance [F] on-board calibration SOLT calibration SOLT model value Resistance [Ohm] 10-12 10-1 10-13 5 3x10 106 107 108 Freq [Hz] (a) CP 109 10-2 5 3x10 106 8x109 107 108 Freq [Hz] (b) RS 109 8x109 10-8 10-9 on-board calibration SOLT calibration SOLT model value 10-10 10-11 8 3x10 109 8x109 Freq [Hz] (c) LS Capacitance [F] Inductance [H] 10-8 on-board calibration SOLT calibration SOLT model value 10-9 on-board calibration SOLT calibration SOLT model value 10-10 5 3x10 106 107 108 Freq [Hz] (d) CN 109 8x109 Fig. 10: Comparison between the values of the parasitic elements of the 1 nF capacitor obtained using both calibration methods, and the optimised value used in the model. -20 0 Magnitude [dB] Magnitude [dB] 0 Measured Modelled -40 -60 3x105 106 -5 -10 107 108 Freq [Hz] (a) S11 109 8x109 -15 3x105 106 Measured Modelled 107 108 Freq [Hz] (b) S21 109 8x109 Fig. 11: Comparison between measured and modelled S-parameters for on-board calibration method for 1 nF capacitor. [8] Cinch Connectivity. RF coaxial, SMA, straight jack, 50 Ohm. [Online]. Available: cinchconnectivity.com/OA MEDIA/specs/pi142-0711-201.pdf [9] Rohde&Schwartz. (2013) R&S ZVA / R&S ZVB / R&S ZVT Vector Network Analyzers Operating Manual. [Online]. Available: www.fer.unizg.hr/ download/repository/ZVA ZVB ZVT Operating.pdf [10] K. Technologies, ADS 2015, Simulation-Analog RF, 2015. [11] Wcalc. (2009) Coplanar Waveguide Analysis/Synthesis Calculator. [Online]. Available: http://wcalc.sourceforge.net/cgibin/coplanar.cgi [12] G. Ghione and C. Naldi, “Parameters of coplanar waveguides with lower common planes,” Electronics Letters, vol. 19, no. 18, pp. 734–735, September 1983. [13] T. Mandic, R. Gillon, B. Nauwelaers, and A. Baric, “Characterizing the TEM Cell Electric and Magnetic Field Coupling to PCB Transmission Lines,” IEEE Trans. on Electromagn. Compat., vol. 54, no. 5, pp. 976–985, Oct 2012. [14] D. Pozar, Microwave Engineering. Wiley, 2004. MIPRO 2016/MEET 115 Impact of Capacitor Dielectric Type on the Performance of Wireless Power Transfer System D.Vinko and P. Oršolić University of Osijek, Faculty of Electrical Engineering, Department of Communications, Osijek, Croatia davor.vinko@etfos.hr Abstract – In loosely coupled wireless power transfer systems, the efficiency is directly affected by the quality factor of the resonant LC tank in both the transmitter and the receiver. This paper studies the impact that the capacitor dielectric type has on the quality factor of the resonant LC tank. Experimental investigation is conducted for low ESR capacitor types and the performance of the wireless power transfer system is evaluated. Focus of the experimental evaluation is placed on the receiver’s end, i.e. the evaluated parameters are current-voltage characteristics of the receiver and maximum power values. Capacitors are also compared with respect to their ESR values on different frequencies and price. The results show that the capacitor type has a significant impact on the performance of the wireless power transfer system. The correct choice of capacitor type can increase the efficiency of power transfer and the maximum achievable power up to 400 %. I. INTRODUCTION Wireless power transfer (WPT) is lately gaining more and more attention with wide spectrum of possible applications [1]-[3]. Applications differ in amount of transferred power, the impedance of the load circuit, the distance of wireless power transfer, the operating frequency, and these are just some of the parameters that can be altered to optimize power transfer. Analysis and the design of the WPT system in a given application is most often based on power and efficiency maximization [4], [5]. In systems that are utilizing wireless power transfer, a quality factor (Q factor) of an LC resonant tank has a significant impact on system performance [6]-[8]. WPT system consists of a transmitter and a receiver, Fig. 1. In WPT system the power is wirelessly transferred from the transmitter to the receiver through alternating magnetic field. Transmitter is represented by an AC voltage source U which drives resonant tank formed by C1 and L1. Receiver is represented with resonant tank formed by L2 and C2, and resistive load RLOAD. M represents the mutual inductance of L1 and L2. In this paper we focus on a loosely coupled inductive WPT system [9], [10], which is characterized by coupling coefficient k below 0.1 [11]. With such setup, the resonant frequency of the WPT system is not significantly affected by mutual inductance of loosely coupled coils of the transmitter and the receiver. Therefore, with low coupling coefficient k, both LC tanks can be separately designed for the same desired resonant frequency, and the changes in mutual inductance M (due to different physical placement and alignment between coils) will not significantly affect the efficiency of wireless power transfer. Factors that do affect the performance of WPT system are parasitic parameters in resonant tank. Fig. 1 shows these parameters for the resonant tank of the receiver. Inductor L2 has parasitic resistance RL and parasitic capacitance CL, while capacitor C2 has following parasitic parameters: parallel resistance RLeak, equivalent series inductance ESL and equivalent series resistance ESR. Both ESL and ESR are frequency dependent. Values of parasitic capacitance CL and equivalent series inductance ESL affect the resonant frequency, while values of parasitic resistances, RL, RLeak and ESR, affect the quality factor of each component (L2 and C2), and consequently the quality factor of the resonant tank. With that in mind, the efficiency of the wireless power transfer is predominantly affected by the quality factor of components used. Quality factor of the component (L or C) is the ratio of stored energy and dissipated energy, which also corresponds to the ratio of its reactance and resistance at a given frequency f. U L1 C1 L2 M Transmitter RLOAD C2 Receiver RLeak C2 L2 CL ESR RL k= M < 0.1 L1 L2 (1) ESL Figure 1. Wireless power transfer system This work was sponsored by J. J. Strossmayer University of Osijek under project IZIP-2014-104 “Wireless power transfer for underground and underwater sensors”. 116 MIPRO 2016/MEET 2π × f ×L RL (2) (3) 1 QC = 2π × f ×C ×ESR Papers that address the impact of capacitor selection on system performance [12], [13] identify the ESR as the main source of losses. In WPT system (Fig. 1), the influence of RLeak on system performance is negligible in comparison to the influence of ESR, which is why RLeak is not included in (3). ESR VS. CAPACITOR DIELECTRIC TYPE Parasitic parameters of a capacitor (RLeak, ESL and ESR) depend on a type of the dielectric used. In this paper we investigate the impact of the ESR on the performance of the WPT system. In order to be used in LC tank of a WPT system capacitor must be non-polarized. Most common non-polarized capacitor types are FILM and ceramic capacitors and both types are characterized as low-ESR capacitor types. Table I gives the list of all capacitors used in this paper. All capacitors are 10 nF capacitors with voltage rating of 100 V. Three capacitor types are evaluated: Ceramic, MLCC (Multi-Layer Ceramic Capacitor) and FILM capacitors. Besides the capacitor type, dielectric type is stated for each capacitor. Three FILM capacitors with same dielectric type are evaluated (dielectric types marked PET1, PET2 and PET3 in Table I). They differ in manufacturer, which leads to significant difference in unit price which is also given for each capacitor. ESR value for each capacitor is measured using handheld LCR meter UT612, which can measure ESR TABLE I. @1 kHz FILM, PET1 FILM, PET2 FILM, PET3 10 1 1.00E+03 @10 kHz Ceramic, Y5P 1.00E+04 Frequency [Hz] 1.00E+05 Figure 2. ESR-frequency characteristics of different capacitor types value on predefined frequencies up to 100 kHz. These values are given in Table I and also shown on Fig. 2. It is important to note than ESR varies with frequency, and that this change is not linear and it varies with type of dielectric. Manufacturers commonly supply a single ESR value at frequency which differs from manufacturer to manufacturer. These common frequencies range from 1 kHz up to 100 MHz. With frequency dependent ESR, a provided single ESR value does not give the complete insight in capacitor performance. III. MEASUREMENTS During measurements, evaluated capacitors were placed (one at a time) in LC tank, parallel to a 100 µH air coil. Due to tolerance in capacitance value, the resonant frequency fR of LC tank varied for each capacitor, and is given in Table II. The maximum deviation of resonant frequency is within 10%, which is the maximum tolerance of tested capacitors. In WPT system (Fig. 1), both LC tanks (L1, C1 and L2, C2) must be adjusted to the same resonant frequency. The resonant frequency of receivers LC tank (L2, C2) changes for different capacitor types. To avoid the necessity of adjusting the resonant frequency of transmitters LC tank (L1, C1 in Fig. 1) for each tested capacitor, a measurement setup shown in Fig. 3 is used. LC tank is not used in transmitter, but only a transmitting coil L1. This also Dielectric type ESR [Ω] Price [€] MLCC; X8R FILM, PP 100 TABLE II. COMPARISON OF USED CAPACITORS C = 10 nF; 100 V MLCC, C0G,NP0 MLCC, X7R MLCC, Z5U For LC tank, the Q factor represents the ratio of stored and dissipated energy in each period (cycle) of resonant frequency. Dissipated energy increases with increase of parasitic resistances (RL, RLeak and ESR), resulting with lower Q factor of the resonant tank, which decreases the efficiency of wireless power transfer. II. 1000 ESR value [Ω] QL = MAXIMUM POWER POINT ANALYSIS RRES=RLOAD @MPP [Ω] fR [kHz]* RLC [Ω] @100 kHz ESR @100 kHz [ Ω] ** QLC @MPP Capacitor type Dielectric type MLCC C0G,NP0 1.48 3.36 1.3 1.58 C0G,NP0 6500 159.9 1.55 1.58 64,7 MLCC X8R 0.82 1.73 1.32 1.60 X8R 6000 161.6 1.72 1.60 59,1 FILM PP 0.52 5.75 1.92 1.64 PP 6000 161.2 1.71 1.64 59,2 FILM PET1 2.55 60.1 13.24 3.47 PET1 4000 163.9 2.65 3.47 38,8 FILM PET2 0.76 71.8 16.7 4.09 PET2 3500 163.7 3.02 4.09 34,0 FILM PET3 0.11 67.7 16.62 4.13 PET3 3500 165.9 3.10 4.13 33,6 MLCC X7R 0.49 137.6 18.9 3.61 X7R 2000 161.0 5.12 3.61 19,8 MLCC Z5U 0.54 152.9 17.97 3.76 Z5U 2000 167.0 5.51 3.76 19,1 Ceramic Y5P 0.29 207 27.6 4.86 Y5P 1000 163.4 10.54 4.86 9,7 * Resonant frequency fR is measured for parallel LC tank with L = 100 μH ** ESR values correspond to values given in Table I MIPRO 2016/MEET 117 U L1 M L2 RLOAD C2 a) dielectric type, and FILM capacitor with PP dielectric type. When compared to ESR measurement (Fig. 2), the same capacitors have the lowest ESR values on all tested frequencies. Important parameter in WPT systems is a maximum instantaneous power that can be supplied by the receiver. Fig. 5 shows output power of a WPT receiver for different values of load resistance RLOAD and different capacitor types. Highest output power is obtained for the same three capacitor types that have the lowest ESR values. For each tested capacitor, the maximum power is available for a different value of load resistance. At maximum power point the load resistance is matched to the resistance of LC tank. At resonant frequency, the resistance of LC tank (RRES) can be expressed as: b) RRES = RLOAD @MPP = ωL ×QLC = Figure 3. Measurement setup a) schematic and b) photograph reduces the impact of mismatch in resonant frequencies of the transmitter and the receiver on measurement results. Output current [mA] A current-voltage output characteristic of the WPT receiver is measured for each tested capacitor, Fig. 4. Result show that with different dielectric type, a maximum output voltage of the WPT receiver varies from 25 V to 105 V. The variations of maximum output current are less prominent, ranging from 17 mA to 20 mA. There are three capacitor types that performed significantly better the rest, MLCC capacitors with C0G, NP0 and X8R 20 18 16 14 12 10 8 6 4 2 0 MLCC, C0G, NP0 MLCC, X8R FILM, PP FILM, PET1 FILM, PET2 FILM, PET3 MLCC, X7R MLCC, Z5U Ceramic, Y5P 0 50 100 Output voltage [V] 150 Figure 4. Current-voltage output characteristics of a WPT receiver for different capacitor types 118 Q LC = 1 ωL , = RLC ωC ×RLC (4) (5) where QLC represents the quality factor of the resonant tank, RLC represents parasitic resistance of LC tank (RL + ESR), and ω is the angular frequency 2πf. Since capacitor and inductor have equal reactance at resonant frequency, two equivalent expressions are given in (4) and (5). To extract resistive loses of LC tank from measurement results, the following expression is derived from (4) and (5): R LC = (ωL )2 R RES = (6) 1 (ωC )2 ×RRES The total parasitic resistance RLC can be calculated by using measured values for resonant frequency fR, and the resistance of LC tank RRES is obtained experimentally by matching load resistance at maximum power point (4). Output power plots for each capacitor (Fig. 5) are drawn using 13 measurement points that do not necessarily correspond to the maximum power point. Therefore the values for RRES given in Table II are extracted from output power plots (Fig. 5) and are best approximation of the Output power [mW] The following component values are used for measurement: L1 = 180 µH (single layer air coil with diameter of 60 cm, 10 turns of 1.6 mm wire), L2 = 100 µH (single layer air coil with diameter of 11 cm, 25 turns of 1 mm wire), C2 = 10 nF (different capacitor types as given in Table I), RLOAD value is varied from 100 Ω to 1 MΩ in 13 discreet steps (100 Ω, 220 Ω, 560 Ω, 1 kΩ, 2.2 kΩ, 4.7 kΩ, 10 kΩ, 22 kΩ, 47 kΩ, 100 kΩ, 220 kΩ, 470 kΩ, 1 MΩ). As a voltage source U, Agilent 33250A arbitrary waveform generator is used, generating sine voltage waveform with peak-to-peak amplitude of 20 V. Frequency of the voltage source U is manually adjusted for each tested capacitor and it corresponds to resonant frequency fR values given in Table I. Voltage across RLOAD is measured using oscilloscope (represented as voltmeter in Fig. 3). Coils L1 and L2 are placed on the same plane, with coupling coefficient k under 0.1, which results with mutual inductance M under 14 µH. QLC , ωC 450 400 350 300 250 200 150 100 50 0 1.E+02 1.E+03 1.E+04 1.E+05 1.E+06 Load resistance [Ω] MLCC, C0G, NP0 MLCC, X8R FILM, PP FILM, PET1 FILM, PET2 FILM, PET3 MLCC, X7R MLCC, Z5U Ceramic, Y5P Figure 5. Output power of a WPT receiver for different capacitor types MIPRO 2016/MEET maximum power point. Calculated values for RLC are given in Table II. REFERENCES Air coil used in LC tank has a DC resistance of 0.2 Ω. At resonant frequency of the LC tank, due to the skin effect, the coil resistance RL increases. Calculated value of the coil resistance RL, at resonant frequency, equals approx. 0.33 Ω. When compared with RLC values given in Table II, the coil resistance makes from 3 % up to 20 % of the total parasitic resistance RLC. The remaining more than 80% of RLC is due to capacitor losses. [1] ESR values measured at 100 kHz are the closest to the resonant frequency (163 kHz ± 2.5%) and are also given in Table II. It can be noted that there is a good correlation between ESR values at 100 kHz and total parasitic resistance RLC. Some discrepancies that occur can be attributed to the non-linear behavior of the ESR. Quality factor of resonant LC tank QLC is calculated (5) and the values are given in Table II. [4] IV. [3] [5] [6] CONLUSION Conducted experimental investigation has shown that the choice of capacitor type has a significant impact on performance of wireless power transfer system. Equivalent series resistance (ESR) of capacitor was identified as dominant factor which determines the performance of WPT system. Three capacitor types were tested: Ceramic, MLCC (Multi-Layer Ceramic Capacitor) and FILM capacitors, all considered to be capacitors with low ESR value. Measurements also showed that ESR value is more affected by the dielectric material than the capacitor type. Best results are achieved by using C0G, NP0 and X8R dielectric for MLCC capacitors, and PP dielectric for the FILM capacitors. With use of those dielectric types system performance is improved by 400 %, with respect to Ceramic capacitor with Y5P dielectric, and the quality factor of resonant LC tank shows six-fold improvement. All tested capacitors were in price range from 0.11 to 2.55 €, and the top three range from 0.52 to 1.48 €. The best performance to price ratio is obtained for FILM capacitor with PP dielectric. MIPRO 2016/MEET [2] [7] [8] [9] [10] [11] [12] [13] Y. Yang, X. Xie, G. Li, Y. Huang, Z. Wang, “A Combined Transmitting Coil Design for High Efficiency WPT of Endoscopic Capsule,” IEEE International Symposium on Circuits and Systems (ISCAS), 2015, pp. 97 – 100. D. Futagami, Y. Sawahara, T. Ishizaki, I. Awai, “Study of high efficiency WPT underseas,” IEEE Wireless Power Transfer Conference (WPTC), 2015, pp. 1 – 4. L. Olvitz, D. Vinko, T. Švedek, “Wireless Power Transfer for Mobile Phone Charging Device,” Proceedings of the 35th International Convention MIPRO, 2012, pp. 141 – 145. G. Kim, B. Lee, “Analysis of Magnetically Coupled Wireless Power Transfer between Two Resonators Based on Power Conservation,” IEEE Wireless Power Transfer Conference (WPTC), 2014, pp. 231 – 234. Y.-K. Jung, B. Lee, “Design of Adaptive Optimal Load Circuit for Maximum Wireless Power Transfer Efficiency,” Asia-Pacific Microwave Conference Proceedings (APMC), 2013, pp. 1221 – 1223. O. Jonah, S. V. Georgakopoulos, M. M. Tentzeris, “Optimal Design Parameters for Wireless Power Transfer by Resonance Magnetic,” IEEE Antennas and Wireless Propagation Letters, Vol. 11, pp. 1390-1393, 2012. I. Awai, T. Ishizako, “Transferred Power and Efficiency of a Coupled-resonator WPT system,” IEEE MTT-S International Microwave Workshop Series on Innovative Wireless Power Transmission: Technologies, Systems, and Applications (IMWS), 2012, pp. 105 – 108. M. Dionigi, M. Mongiardo, “Coaxial Capacitor Loop Resonantor for Wireless Power Transfer Systems,” The 7th German Microwave Conference (GeMiC), 2012, pp. 1 – 4. F. Lu, H. Zhang, H. Hofmann, C. Mi, “A High Efficiency 3.3 kW Loosely-Coupled Wireless Power Transfer System Without Magnetic Material,” IEE Energy Conversion Congress and Exposition (ECCE), 2015, pp. 2282 – 2286. J. A. Russer, P. Russer, “Design Considerations for a Moving Field Inductive Power Transfer System,” IEEE Wireless Power Transfer (WPT) Conference, 2013, pp. 147 – 150. K. A. Grajski, R. Tseng, C. Wheatley, “Loosely-Coupled Wireless Power Transfer: Physics, Circuits, Standards,” Proceedings of IMWS-IWPT, 2012, pp. 9 – 14. S. Karys, “Selection of Resonant Circuit Elements for the ARCP Inverter,” 10th International Conference on Electrical Power Quality and Utilisation EPQU, 2009, pp. 1 – 6. H. J. H. Since, S. Taninder S., B. J. Kai, “The Impact of Capacitors Selection and Placement to the ESL and ESR,” International Symposium on Electronics Materials and Packaging EMAP, 2005, pp. 258 – 261. 119 Switching Speed and Stress Analysis for Fixed-fixed Beam Based Shunt Capacitive RF MEMS Switches Anoushka Kumar A, Resmi R LBS Institute of Technology for Women, Thiruvananthapuram, India anoushkakumar4@gmail.com, resmilbs@gmail.com Abstract—In this paper, effect of different materials on the reliability of RF MEMS shunt capacitive switch is analyzed. The effective von-Mises stress analysis is done on a fixed-fixed beam switch for materials like Titanium, Platinum, Gold, Aluminium and Copper. The maximum value of von-Mises stress obtained for each material was less than their corresponding ultimate tensile strength. Membrane using Titanium, Copper and Platinum can withstand more number of switching cycles. The variation in switching speed of a fixed-fixed beam structure using Aluminium, Platinum, Titanium, Copper and Gold is also analysed. Membranes using Titanium and Copper give better performance with respect to switching speed and reliability. Keywords—Fixed-fixed strength I. beam; von-Mises stress; Tensile INTRODUCTION voltage and hot switching. Fig. 1 shows the classification of RF MEMS switches. Figure 1. Classification of RF MEMS switches Fig. 2 shows the shunt and series configurations of RF MEMS switches [2]. Radio Frequency Micro Electro Mechanical System (RFMEMS) switches have been an attractive field for both scientific research and industry due to its promising applications in RADAR systems, Satellite communication systems, Wireless communication systems and Instrumentation systems. Compared to traditional GaAs FET and p–i–n diode switches RF MEMS switches have negligible power (few µ-watts) consumption, low insertion loss, high isolation, much lower intermodulation distortion, low cost and light weight[1]. RF MEMS switches replaced the traditional GaAs FET and p–i–n diode switches in RF and microwave systems. Typically MEMS switches are manufactured using surface micro-machining processes. In RF-MEMS switches, the mechanical movement of switch membrane which can be either a fixed–fixed beam or a cantilever beam creates a short circuit or an open circuit in transmission line. This mechanical movement is achieved using electrostatic, piezoelectric, magnetostatic or thermal actuation. Even though electrostatic method requires a high actuation voltage it is the most prevalent one due to its near zero power consumption, small electrode size, thin layers and short switching time. In electrostatic actuation, electrostatic force is generated between fixed electrode and movable membrane for switching operation. The main limitations of RF MEMS switches include slow switching speed, high actuation 120 Figure 2. Shunt and series circuit configuration of MEMS switches Various designs are existing on RF MEMS switches to tackle the limitations faced by switches. Stiff ribs around the membrane helps to reduce stiction and buckling effect in switches [3]. Meander in membrane reduces the required actuation voltage of a RF MEMS switch [4]. For highly reliable operation twin layered membrane is preferred [5]. Holes in switch membrane help to reduce actuation voltage and it also reduces squeeze film air damping [6]. MIPRO 2016/MEET II. from reference plane to the edge of membrane and w is the CPW centre conductor width. SHUNT CAPACITIVE SWITCH A. General Structure RF MEMS shunt capacitive switch generally consists of a movable metal bridge, suspended at a height „g0‟ above the center conductor. The dielectric layer is used above the center conductor, so that the switch membrane does not come into contact with the centre electrode during the actuation. Fig. 3 and Fig. 4 show the switch in OFF and ON states respectively [7]. Figure 5. Equivalent C–L–R circuit model of shunt capacitive switch B. Device Geometry Fixed-fixed beam is usually preferred for the beam structure in MEMS switches because it provides better stability and much lower sensitivity to stress. Fig. 6 shows the 3D structure of a fixed-fixed beam switch modeled in COMSOL Multiphysics. The switch consists of a square plate which is suspended above a thin film of Silicon nitride (relative dielectric constant 7.5). Silicon nitride is the commonly used dielectric layer in shunt capacitive switches. There is a silicon counter-electrode below the substrate which is grounded. Four rectangular flexures are used at the corners to anchor the plate to the substrate. The switch dimensions and material parameters used for simulating the switch membrane are shown in Table I and Table II. Figure 3. Cross sectional view of RF MEMS Shunt Switch (OFF) Figure 4. Cross sectional view of actuated MEMS Shunt switch (ON) When a dc voltage is applied, an electrostatic force is produced between the beam and centre conductor. When this voltage exceeds pull-in voltage, the electrostatic forces overcome elastic recovery forces and pulls down the membrane over dielectric layer on signal line. It results in the formation of an open circuit between transmission lines and RF signal which prohibits the transmission of the signal. The pull-in voltage of fixed-fixed beam switch is given by VP = 8k 27ɛ0 A g 30 (1) Here k is the spring constant in N/m, g0 is the initial gap in μm, ε0 is the free space permittivity and A is the electrostatic area in μm2. Fig. 5 shows the equivalent circuit model of shunt capacitive switch [1]. The sections of transmission line are of length l+w/2, where l is the distance MIPRO 2016/MEET Figure 6. Modeled 3D structure of fixed-fixed beam switch in COMSOL Multiphysics 121 TABLE I. SWITCH SPECIFICATIONS Component Length(µm) Width(µm) Depth(µm) Membrane frame Contact area 220 100 1 100 100 1 Flexures 60 5 1 Dielectric layer 220 100 0.1 Gap - - 0.9 III. STRESS ANALYSIS USING FEM The von-Mises stress analysis is mandatory to check whether the membrane will withstand a given load condition. If the maximum stress increases for a given gap height from the stress a material can withstand, it indicates the failure of design[8]. The effective von-Mises stress in membrane for the required force at specific spring constant is analyzed for some commonly available metals like Titanium, Platinum, Gold, Aluminium and Copper. Simulation of stress gradient is done using Finite Element Method in COMSOL Multiphysics. Fig. 7 to Fig. 11 demonstrates the stress distribution in membrane using different materials. From Fig.7 to Fig.11, the colour variation across the surface of membrane of switch indicates the corresponding stress as provided in the colour chart at the right side of the plot. von-Mises stress profile obtained is different for different materials. The maximum stress can be seen along the flexures and minimum stress is obtained in the contact area. Figure 7. von-Mises stress demonstrating maximum stress of 68MPa in Copper membrane Fig. 8 shows the maximum and minimum value of vonMises stress obtained in membrane using Titanium Figure 8. von-Mises stress demonstrating maximum stress of 68MPa in Titanium membrane TABLE II. SWITCH PARAMETERS Material Young‟s Modulus(GPa) Poisson‟s ratio Density(kg/m3) Copper 110 0.35 8700 Titanium 113 0.29 4540 Gold 79 0.44 19320 Aluminium 70 0.32 2700 Platinum 170 0.39 21450 Fig. 9 shows the maximum and minimum value of vonMises stress obtained in membrane using Gold Fig. 7 shows the maximum and minimum value of vonMises stress obtained in membrane using Copper. Figure 9. von-Mises stress demonstrating maximum stress of 48MPa in Gold membrane 122 MIPRO 2016/MEET Fig. 10 shows the maximum and minimum value of vonMises stress obtained in membrane using Aluminium The maximum and minimum value of von-Mises stress obtained for membrane using different materials along with their corresponding ultimate tensile strength are shown in Table III. Ultimate tensile strength is the value of maximum stress that a material can withstand while being stretched or pulled before breaking. From Table III it is clear that the maximum von-Mises stress obtained for Titanium, Copper, Gold, Aluminium and Platinum are less than their corresponding ultimate tensile strength. It indicates that design of membrane using these materials is safe for switching operation. Moreover, the maximum von-Mises stress obtained for Titanium, Copper and Platinum were much less than their corresponding ultimate tensile strength. Therefore membrane using these materials can be used for much more switching cycles than the membrane made using Gold or Aluminium. IV. Figure 10. von-Mises stress demonstrating maximum stress of 43MPa in Aluminium membrane Fig. 11 shows the maximum and minimum value of vonMises stress obtained in membrane using Platinum PULL-IN TIME ANALYSIS One of the important parameters to be considered in the design of RF MEMS switches is the switching rate. Switching rate, also referred to as switching speed is the time required for the switch to respond at the output due to the change in control voltage. When a voltage is applied, electrostatic force build up and it tends to pull down the membrane and when this electrostatic force exceeds elastic recovery force of membrane, pull in occurs[9]. The pull-in time required by membrane using different materials is simulated on a fixedfixed beam based shunt capacitive switch. As shown in Fig. 12, the membrane using Platinum takes more than 50µs for a displacement of 0.65µm. Figure 11. von-Mises stress demonstrating maximum stress of 40MPa in Platinum membrane TABLE III. COMPARISON OF STRESS ANALYSIS Material Von-Mises stress Max (MPa) Von-Mises stress Min (MPa) Ultimate tensile strength (MPa) Copper 68 0.59 210 Titanium 68 0.58 240-370 Gold 48 0.39 100 Aluminium 43 0.36 40-50 Platinum 40 0.27 200-300 MIPRO 2016/MEET Figure 12. Displacement of the center of Platinum membrane in the modeled switch When membrane using Copper is used, the time required for pull-in was obtained as 41µs for a displacement of 0.9µm. Fig. 13 shows the switching time required by membrane using Copper. 123 When membrane using Titanium is used, the time required for pull-in was obtained as 39µs for a displacement of 0.9µm. Fig. 16 shows the switching time required by membrane using Titanium. Figure 13. Displacement of the center of Copper membrane in the modeled switch When membrane using Aluminium is used, the time required for pull-in was obtained as 37µs for a displacement of 0.9µm. Fig. 14 shows the switching time required by membrane using Aluminium. Figure 16. Displacement of the center of Titanium membrane in the modeled Switch TABLE IV. SWITCHING TIME REQUIREMENT FOR DIFFERENT MATERIALS Figure 14. Displacement of the center of Aluminium membrane in the modeled switch When membrane using Gold is used, the time required for pull-in was obtained as 45µs for a displacement of 0.9µm. Fig. 15 shows the switching time required by membrane using Gold. Figure 15. Displacement of the center of Gold membrane in the modeled switch 124 Material used Switching time(µs) Titanium 39 Gold 45 Aluminium 37 Copper 41 Platinum >50 From Table IV it is clear that among the five materials used for simulation of membrane, Platinum gives the worst switching speed. The switching time required by membrane using Titanium, Copper and Aluminium is less compared to membrane using Platinum and Gold. Thus for better switching speed conditions, Platinum and Gold are not a good choice. V. CONCLUSION Stress analysis is done for a fixed-fixed beam shunt capacitive RF MEMS switch for materials like Titanium, Platinum, Gold, Aluminum and Copper. Membranes using Titanium, Copper and Platinum can withstand much more switching cycles than the membrane made using gold or aluminium. Thus membranes using Titanium, Copper and Platinum are more reliable compared to membranes using Gold and Aluminium. Platinum and Gold are not a good choice for better switching speed conditions. Membranes using Titanium and Copper give better performance with respect to switching speed and reliability. When both reliability and switching speed is considered simultaneously, membranes made up of Titanium is found to be a better choice. MIPRO 2016/MEET VI. [1] [2] [3] [4] [5] [6] [7] [8] [9] REFERENCES Gabriel M Rebeiz, “RFMEMS Theory, Design and Technology”, John Wiley and Sons Limited, New Jersey, 2002 J Jason Yao, “Topical review RF MEMS from a device perspective”, Journal of Micromechanics and Microengineering, 10 (2000), R9–R38 Goldsmith CL, Lin T, Powers B, Wu W, Norvell B, “ Micromechanical membrane switches for microwave applications”, IEEE international MTT-S symposium digest, vol 1, pp 91–94,1995 Muldavin, J.B, Rebeiz, G.M, “High-isolation CPW MEMS shunt switches, part 1: modelling”, IEEE Trans. Microw. Theory Tech., pp1045–1052 , 2000 Singh, T, Khaira, N, Sengar, J, “Stress analysis using multiphysics environment of a novel RF MEMS shunt switch designed on quartz substrate for low voltage applications”,Trans Electr Electron. Mater,vol 2,2013 Singh A, Dhanjr, “Design and modeling of a robust wideband Poly-Si and Au based capacitive RF MEMS switch for millimeter wave applications”, Proc 2nd Int Conf Comput Sci, vol 1,Elsevier, GmbH, PB, pp 108–114,2014 Bhadri, Baiju, “Electromagnetic Analysis of RF MEMS Switch”, International Journal of Engineering Research & Technology, Vol. 3 Issue 9, September 2014 Tejinder Singh, “Computation of Beam Stress and RF Performance of a Thin Film Based Q-Band Optimized RF MEMS Switch”, Transactions on Electrical andEelectronic materials, vol. 16, no. 4, pp. 173-178, august 25, 2015 S. Shekhar, 2 K. J. Vinoy, 3 G. K. Ananthasuresh, “Switching and Release Time Analysis of Electrostatically Actuated Capacitive RF MEMS Switches”, Sensors & Transducers Journal, Vol. 130, Issue 7, pp 77-90, July 2011. MIPRO 2016/MEET 125 Performance Analysis of Micromirrors - Lift-off and von Mises stress Sharon Finny, Resmi R LBS Institute of Technology for Women, Thiruvananthapuram, India sharon.finny2010@gmail.com, resmilbs@gmail.com Abstract—Micromirrors are used in numerous applications such as optical switching, projection displays, biomedical imaging and adaptive optics. In this paper, the structural mechanical properties of an electrostatically controlled square shaped micromirror structure are studied. Lift-off analysis and stress analysis is also conducted and materials which can boost the performance of the micromirror are identified. Higher lift-off is observed for silicon-aluminium structure, but it is not recommended due to severe edge displacement and surface deformation. . Structural steel-aluminium material combination shows maximum lift-off with minimal surface deformation and the maximum value of von Mises stress obtained is less than the yield strength of the material thus ensuring safe operation of the structure. The modeling and analysis are done using COMSOL Multiphysics 4.3b. developed during the fabrication process [6]. The plating process usually controls this stress and it can be either compressive or tensile. The prestress level is normally set depending on the lift-off required. II. MODELING OF MICROMIRRORS The micromirror is modeled using COMSOL Multiphysics software. The geometry consists of a centre mirror plate surrounded by cantilever springs on each of the four sides. The 2D geometry of the micromirror is shown in Fig.1. Appropriate boundary conditions are selected, and then meshing is performed on the model to obtain final refined mesh with hexahedral elements. The meshed geometry is shown in Fig. 2. Keywords—micromirror; lift-off; von Mises stress; I. INTRODUCTION The progress of MEMS technology has led to the development of miniaturized optical devices which has a huge impact on a large number of optical applications. These include movable and tunable mirrors, lenses, filters and other optical structures [1]. Micromirrors are a much- researched area in Micro-Opto-Electro-Mechanical Systems (MOEMS). The movable micromirrors are micro optical components used for spatial manipulation of light. The incident light can be reflected to an expected direction by moving the mirror plate so as to modulate phase and/or the amplitude of the incident light [2]. Micromirrors have been used in a wide variety of applications including optical switching, projection displays, endoscopic imaging, barcode readers, laser beam steering etc [3]. Micromirrors are commonly classified on the basis of their method of actuation. The most common actuation methods include electrostatic actuation, electrothermal actuation, electromagnetic actuation and piezoelectric actuation. Electrostatic method of actuation is preferred over the other methods because of its low power consumption, faster response and simplicity of implementation. One of the drawbacks of electrostatic actuated mirrors is that they exhibit pull in, hysteresis and smaller vertical displacement range [4]. Prestressed actuators have been developed to address the issue of pull in, hysteresis and to obtain large vertical displacement [5].These actuators are operated electrostatically and their electromechanical behavior is influenced by the residual stress 126 Figure 1. 2D Geometry of Micromirror. Figure 2. 3D Meshed Geometry of the micromirror. MIPRO 2016/MEET III. SIMULATION RESULTS A. Lift-off Analysis The lift-off analysis was performed in COMSOL Multiphysics to examine the effect of changing material combinations on the lift-off process. Fig. 3 - Fig. 8 show the lift-off when different materials such as aluminium, silicon, structural steel, iron, steel AISI 4340 and tungsten were used as the centre mirror plate. Aluminium was the material used for the cantilever beams. The initial normal stress was set as 5 GPa. In Fig. 3 and Fig. 4, the maximum lift-off is present at the edges and edge displacement relative to centre plate is more than 0.2 mm. Also, there is surface deformation at the edges which makes the structure unsuitable for a mirror plate. Figure 5. Total surface displacement for structural steel. The maximum lift-off for iron and steel AISI 4340 plates are less than that of structural steel as shown in Fig. 6 and Fig. 7 respectively. Figure 3. Total surface displacement for aluminium. Figure 6. Total surface displacement for iron. Figure 4. Total surface displacement for silicon. For structural steel plate, the edge displacement is less compared to aluminium and silicon mirror plate as shown in Fig. 5. Steel deforms less because it is stiffer compared to aluminium and silicon. Structural steel has a Young’s modulus of 200 GPa whereas that of aluminium and silicon is 69 GPa and 150 GPa respectively. MIPRO 2016/MEET Figure 7. Total surface displacement for steel AISI 4340. 127 Tungsten has a Young’s modulus of 411 GPa and it is stiffer compared to other materials. Therefore, tungstenaluminum combination exhibits the least displacement as shown in Fig. 8 Figure 8. Total surface displacement for tungsten. In all the material combinations analysed, the maximum displacement was observed at the ends of cantilever beam connected to the mirror plate and minimum at the opposite ends. Table I shows a comparison of maximum lift-off for different material combinations of the centre mirror plate and the cantilever beams. TABLE I. B. Displacement v/s Prestress Analysis Fig. 10 shows the variation of the total centre point deflection with different prestress levels for different centre plate materials. The mirror response is nearly linear in case of all the centre plate materials used. Silicon exhibits greater deflection compared to other materials for all the values of applied prestress, but it shows high displacement at the edges of the centre plate. COMPARISON OF LIFT-OFF FOR DIFFERENT MATERIAL COMBINATIONS Material of centre plate Maximum Displacement or Lift-off (mm) Aluminium 0.3116 Silicon 0.287 Structural Steel 0.2545 Iron 0.2509 Steel AISI 4340 0.2447 Tungsten 0.1476 Fig. 9. shows the total edge displacement for structural steel-aluminium combination which has a maximum value of 0.15 mm and therefore this material combination gives a comparatively stable structure with less surface deformation. In the case of other combinations the lift-off is too low. 128 Figure 9. Edge: Total displacement: structural steel Figure 10. Total centre point displacement v/s prestress for different material combinations. Fig. 11 shows the mirror’s curvature along its centerline for a range of values of prestress. As per the graph, mirror bends more with increasing stress. MIPRO 2016/MEET Figure 13. Surface von Mises stress for different prestress levels. Figure 11. Total centre point deflection for different prestress levels: structural steel plate. C. von Mises Stress Analysis The effective stress at which yielding of any ductile material occur is known as the yield strength. In order to avoid static or fatigue failure of the structure, the von Mises stress induced in the material should be less than its yield strength. In (1), σv and σy denote von Mises stress and yield strength respectively. σv ≤ σy (1) IV. CONCLUSION In this paper, an electrostatically controlled micromirrors was modeled and analysed using COMSOL Multiphysics software. The lift-off analysis was carried out for different mirror plate materials. The effect of different prestress levels on the deflection of mirror plate was also studied. The result obtained shows that structural steelaluminium micromirror has high lift–off with minimal surface deformation at the edges which improves the stability of the structure. The maximum value of von Mises stress is less than the yield strength of the material which ensures the safe operation of micromirror structure. The von Mises stress, σv can be expressed in terms of principal stresses σ1 , σ2 , σ3 as given in (2). σv = [ (σ1 −σ2 )2 +(σ2 −σ3 )2 +(σ3 −σ1 )2 2 ] 1⁄ 2 V. REFERENCES [1] Olav Solgard, Asif A. Godil, Roger T. Howe, Luke P. Lee, Yves-Alain Peter, Hanes Zappe, “Optical MEMS:From micromirrors to complex systems,” Journal of MicroElectroMechanical Systems, vol. 23, no.3, June 2014. [2] Xingguo Xiong and Hanyu Xie, “MEMS Dual-mode Electrostatically Actuated Micromirror,” Proceedingsof 2014 Zone 1 Conference of the American Society for Engineering Education (ASEE Zone 1) [3] Y. P. Zhu, W. J. Liu, K. M. Jia, W. J. Liao, and H. K. Xie, “A piezoelectric unimorph actuator based tip-tilt-piston micromirror with high fill factor and small tilt and lateral shift,” Sens. Actuators A, Phys., vol. 167, no. 2, pp. 495–501, 2011. Yuan Ma, Shariful Islam, and Ya-Jun Pan, “Electrostatic torsional micromirror With enhanced tilting angle using active control Methods”, IEEE/Asme Transactions on Mechatronics, Vol. 16, No. 6, December 2011. (2) Fig. 12 shows the distribution of von Mises stress over the surface of the mirror structure with structural steel plate. The von Mises stress is high at the edges and decreases towards the centre of the mirror plate. To ensure the safety of structure against failure, von Mises stress must be less than the yield strength of the material (250 MPa). As per the Fig.13, the structure can be safely operated for prestress values less than 1 GPa. [4] [5] M. Maheswaran and HarNarayan Upadhay, “A Study on Performance Driven Microfabrication methods for MEMS Comb-drive Actuators,”.Journal of Applied Sciences, 12: 920-928, 2012. [6] R. Sulima and S. Wiak, “Modelling of vertical electrostatic comb-drive for scanning micromirrors,” Int. J. Comput. Math. Electr. Electron. Eng., vol. 27, no. 4, pp. 780–787, 2008. Figure 12. Surface von Mises stress: structural steel plate. MIPRO 2016/MEET 129 Material and Orientation Optimization for Quality Factor Enhancement of BAW Resonators Reshma Raj R S and Resmi R L B S Institute of Technology for Women, Thiruvananthapuram, India reshmarajrs1991@gmail.com, resmilbs@gmail.com Abstract - In this paper Quality factor of different Bulk Acoustic Wave resonators using various piezoelectric materials are analyzed. Higher the Quality factor, better is the resonator. High Quality factor provides higher signal to noise ratio, higher resolution and low power consumption. The results indicate that Rochelle Salt possess maximum value of Quality factor of about 2999.729. By changing the orientations maximum value of Quality Factor value obtained is about 42696.38 at 81◦. A MEMS based Bulk Acoustic Wave resonator is designed using COMSOL Multiphysics software. Keywords - Bulk Acoustic Wave, Eigen frequency analysis, Quality Factor, BAW resonator I. INTRODUCTION BAW resonator is the core element of the BAW technology. BAW resonator is an electromechanical device in which, with the application of electrical signal a standing acoustic wave is generated in the bulk of piezoelectric material. In simple words, a device consisting of piezoelectric material is sandwiched between two metallic electrodes. To obtain the desired operating frequency, the natural frequency of the material and the thickness are used as design parameters. When the voltage is applied to the top electrode of the resonator, the bulk acoustic mode of the resonator is obtained from eigen frequency analysis. Bulk acoustic wave promises frequency in GHz range when integrated with RF circuits along with small size resonator and filters [4]. The resonator considered in the present paper is of thin film [5], [6] type in which the substrate is partially etched away on the back. The natural frequency of the material and Quality factor are used as design parameters to obtain the desired resonant frequency. FBAR components offer small sizes, low cost, high quality factor, large power operation and compatibility with silicon low-cost process, enabling mass production and filter integration [7]. Fig.1 shows geometry of the modelled resonator. The lowest layer of the resonator is silicon substrate on top of which is the aluminum layer that operates as the ground electrode. A piezoelectric layer is laid over the ground electrode and above this piezoelectric layer is the metal electrode. The Perfectly Matched Layer (PML) effectively simulates the effect of propagation and absorption of elastic waves in the adjoining regions. The resonator dimensions used for simulating the resonator are shown in Table I. The main mode of operation of BAW resonator [1] is the thickness or longitudinal mode, meaning that the bulk acoustic wave reflects the large plate surface and the resonance caused by the wave excited to the thickness direction. A. Thin Film Bulk Acoustic Resonator Thin Film Bulk Acoustic Resonator (FBAR or TFBAR) [2] is a device consisting of a piezoelectric material sandwiched between two electrodes and acoustically isolated from the surrounding medium with thicknesses ranging from several micrometres down to tenth of micrometres resonate in the frequency range of roughly 100 MHz to 10 GHz.In the wireless telecommunication world, film bulk acoustic wave resonators [3] are very promising for use as RF MEMS filters since they can be combined to make up duplexers (transmitter/receiver modules). The FBAR is the chosen device in this study as an example of high frequency MEMS with thin film targeted for use in the ever growing wireless communications industry. 130 Figure 1. Geometry of Thin Film Bulk Acoustic Wave Resonator TABLE I. RESONATOR DIMENSIONS Parameters Thickness Width Silicon layers 7μm 1.7mm top 0.2μm 500μm middle 0.2μm 1.7mm 9.5μm 1.7mm Metal layers (Aluminium) Piezoelectric layer (Rochelle Salt) MIPRO 2016/MEET II. EQUIVALENT CIRCUIT REPRESENTATION For making circuit designs using MEMS resonators, an equivalent electrical circuit that describes their frequency dependent characteristics is needed. The equivalent Butterworth Van Dyke (BVD) lumped equivalent circuit of BAW resonator [8] is shown in Fig.2. An obvious way to determine the frequency dependent admittance Y(ω) of a MEMS resonator is using a frequency response analysis. In this type of analysis, an actuation force of a frequency is applied to the surface of the resonator and the amplitude response is simulated. Because the Quality factor of MEMS resonators is usually high, this method requires a high density of frequency points to resolve a single resonance of the resonator. Moreover, fitting of Y(ω) is required to extract the equivalent circuit parameters. In this paper an alternative method is described to directly extract the Quality factor from an eigen frequency analysis. The method will be applied to the resonator in Fig.2. Eigen frequency and frequency domain analyses are different studies that are independent of each other. An eigen frequency step will gives list of all the natural frequencies of the system. Then we have to examine the deformed mode shape to see what displacement will be qualitatively. The normalization used in an eigen value problem is somewhat arbitrary. However, it is likely that the largest displacement will corresponds to the lowest mode. III. QUALITY FACTOR ANALYSIS Quality factor is one of the most important characteristics of MEMS resonators, especially for vibrating structures where the resonant frequency variation is monitored. Higher the Quality factor value better the resonator performance. Signal to noise ratio increases and power dissipation decreases. High Quality factor circuits can be used for various wireless applications. Quality factor value is a dimensionless parameter. As we increase resistance value, the frequency range over which the dissipative characteristics dominate the behaviour of the circuit increases. Quality factor is related to the sharpness of the peak. Quality factor is an expression of the cyclic energy loss in an oscillating system. In terms of energy, it is expressed as the total energy stored the system divided by the energy loss per cycle, (2) Q = 2π× Quality factor analysis by using different materials is explained below. In Eigen frequency analysis there are 6 modes of frequencies. The lowest frequency for which these stationary waves are formed is called fundamental harmonic. The modes of oscillation have different shapes for different frequencies [9]. Here in this paper Quality factor is evaluated using eigen frequency, from the equation given below. Quality Factor, = (3) Table II shows the Quality factor analysis of Zinc Oxide. By using Zinc Oxide maximum value of Quality factor obtained is about 1330.08 at an Eigen frequency of 221.42 MHz. TABLE II. QUALITY FACTOR VALUES OBTAINED AT DIFFERENT EIGEN FREQUENCY VALUES USING ZINC OXIDE Figure 2. Equivalent Butterworth Van Dyke lumped element equivalent circuit of BAW resonator Each branch contains a resistor, an inductor, and a capacitor, representing a resonance in the frequency response of the resonator. The admittance of the equivalent circuit is Y(ω) = ∑ +iω (1) Eigen Frequency(MHZ) Quality Factor 217.73 1199.7624 218.7 936.23781 219.55 1188.7183 220.63 936.51987 221.42 1330.0896 222.13 937.4436 Table III shows the Quality factor analysis of Aluminium Nitride. By using Aluminium Nitride maximum value of MIPRO 2016/MEET 131 Quality factor obtained is about 1808.26 at an Eigen frequency of 225.42 MHz. TABLE III. QUALITY FACTOR VALUES OBTAINED AT DIFFERENT EIGEN FREQUENCY VALUES USING ALUMINIUM NITRIDE Eigen Frequency(MHZ) Quality Factor 214.8 1616.2625 216.03 1784.7664 221.3 1662.5064 222.09 1784.1761 225.42 1808.2671 Table IV shows the Quality factor analysis of Rochelle Salt. By using Rochelle Salt maximum value of Quality factor obtained is about 2999.72 at an Eigen frequency of 218.95 MHz. TABLE IV. QUALITY FACTOR VALUES OBTAINED AT DIFFERENT EIGEN FREQUENCY VALUES USING ROCHELLE SALT Eigen Frequency(MHZ) Quality Factor 218.76 1165.6162 218.95 2999.7291 219.38 1185.5266 220.26 1083.3847 220.99 1228.3316 Table V shows the Quality factor analysis of Lithium Niobate. By using Lithium Niobate maximum value of Quality factor obtained is about 1360.2926 at an Eigen frequency of 222.88 MHz. TABLE V: QUALITY FACTOR VALUES OBTAINED AT DIFFERENT EIGEN FREQUENCY VALUES USING LITHIUM NIOBATE Eigen Frequency(MHZ) Quality Factor 217.21 665.18125 219.17 1152.154 219.59 619.25414 221.03 1309.1761 222.57 681.24313 222.88 1360.2926 Quality factor Vs Eigen Frequency plot of different materials is shown in Fig.3 below. 132 Figure 3.Quality factor analysis by using different piezoelectric materials From the Quality factor analysis it is clear that by using Rochelle Salt a maximum Quality factor value of about 2999.729 is obtained. Further analysis is done by using Rochelle Salt as a piezoelectric material via varying the orientations. Here, the piezoelectric layer is rotated along the X axis. The eigen-frequencies and respective Quality factor change when the piezoelectric materials inside the resonator were rotated. This is due to the crystallographic effect [10] of piezoelectric materials. However, any angular misalignment from the axis of transduction disrupts the atomic linearity which might lead to acoustic losses at the atomic level, the ensemble of which might reflect on increase in the Quality factor of the resonator. Table VI shows the Quality factor Vs Eigen frequency plot obtained using Rochelle Salt by rotating piezoelectric layer along 30°. Here maximum value of Quality factor obtained is about 17705.13955 at an Eigen frequency of 220.6 MHz. TABLE VI. QUALITY FACTOR ANALYSIS BY ROTATING PIEZOELECTRIC LAYER ALONG 30◦ Eigen Frequency(MHZ) Quality Factor 215.47 5601.25809 218.12 12669.76214 219.6 13173.77809 220.6 17705.13955 222.62 11062.03554 Table VII shows the Quality factor Vs Eigen frequency plot obtained using Rochelle Salt by rotating piezoelectric layer along 50°. Here maximum value of Quality factor obtained is about 40998.42342 at an Eigen frequency of 222.18 MHz. MIPRO 2016/MEET TABLE VII. QUALITY FACTOR ANALYSIS BY ROTATING PIEZOELECTRIC LAYER ALONG 50◦ Eigen Frequency(MHZ) Quality Factor 214.95 19496.29908 216.47 32516.34201 218.1 35075.73867 219.56 29980.49335 222.18 40998.42342 Table VIII shows the Quality factor Vs Eigen frequency plot obtained using Rochelle Salt by rotating piezoelectric layer along 60°. Here maximum value of Quality Factor obtained is about 32554.42619 at an Eigen frequency of 220.42 MHz. Figure 4.Quality Factor analysis at different orientations IV. TABLE VIII.QUALITY FACTOR ANALYSIS BY ROTATING PIEZOELECTRIC LAYER ALONG 60◦ Eigen Frequency(MHZ) Quality Factor 214.36 32137.45978 217.7 28887.5745 220.42 32554.42619 221.27 18363.90608 224.99 17416.11227 RESULTS Eigen frequency analysis is used to find out Quality factor. All the design and modeling are done using COMSOL Multiphysics software. The Eigen frequency analysis plot shown in Fig.5 shows the lowest BAW mode of the structure that occurs at a particular frequency obtained via material optimization. The eigen frequency values are shown in results as complex numbers wherein the real part provides the actual displacement frequency while the imaginary part is an indication of the extent of damping. Table IX shows the Q factor Vs Eigen frequency plot obtained using Rochelle Salt by rotating piezoelectric layer along 81°. Here maximum value of Quality Factor obtained is about 42696.38989 at an Eigen frequency of 218.84 MHz. TABLE IX. QUALITY FACTOR ANALYSIS BY ROTATING PIEZOELECTRIC LAYER ALONG 81◦ Eigen Frequency(MHZ) Quality Factor 216.15 23987.13199 216.76 13140.93297 218.84 42696.38989 220.4 38909.29835 223.42 26911.90005 Quality factor analysis by varying orientations is shown in Fig.4 below. MIPRO 2016/MEET Figure 5.The lowest bulk acoustic mode of the resonator identified from the solutions of the Eigen frequency analysis via using Rochelle Salt as piezoelectric layer. Fig.6 shows the lowest BAW mode of the structure that occurs at a particular frequency obtained via changing orientation. By changing orientations maximum value of Quality factor (42696.38989) is obtained by rotating piezoelectric layer along 81°. This is due to the crystallographic 133 effect of piezoelectric materials. The colour variation across the surface of the resonator and the corresponding displacement for that eigen frequencies are shown in the colour chart at the right side of the plot in Fig.5 and Fig.6. value of Quality factor (42696.38) is obtained.Higher the Quality factor value better the resonator and by improving Quality factor signal to noise ratio and sensitivity are also increased. REFERENCES Figure 6. The lowest bulk acoustic mode of the resonator identified from the solutions of the Eigen frequency analysis by rotating piezoelectric layer along 81◦ V. CONCLUSION Quality factor of different Bulk Acoustic Wave resonators using various piezoelectric materials are analyzed. It was found that by using Rochelle Salt a Quality factor of about 2999.729 is obtained at a frequency of 218.955MHz.This is the highest value obtained by analysing these piezoelectric materials. By changing the orientations it is found that at 81° maximum 134 [1] Sneha Dixit,” Film Bulk Acoustic Wave (FBAR) Resonator”, in Scientific andResearchPublications,Vol.5Iss.2,2015 [2]Gabriel.M.Rebeiz(2003),RF MEMS Theory, Design and Technology [3] Tapani Makkonen, Antti Holappa, Juha Ella, and Martti M. Salomaa,“Finite element simulations of Thin-Film Composite BAW Resonators,” IEEE Trans. On Ultrason, Ferroelectrics, and Frequency Cotrol, VOL. 48, no5,pp.1241-1258, 2001 [4] W.A. Burkland, A.R. Landlin, G.R. Kline, and R.S. Ketcham, “A thinfilm-bulk-acoustic-wave resonator-controlled oscillator on silicon,” IEEE Electron. Device Lett., vol. EDL-8, no. 11, pp. 531-533, 1987 [5] R.B. Stokes, J.D. Crawford, and D. Cushman, “Monolithic bulk acoustic filters to X-band in GaAs,” in Proc. IEEE Ultrason. Symp., pp. 547-551, 1993 [6] T.W. Grudkowski, J.F. Black, T.M. Reeder, D.E. Cullen, and R.A. Wagner, “Fundamental-mode VHF/UHF miniature acoustic resonators and filters on silicon,” Appl. Phys. Lett., vol. 37, no. 11, pp. 993-995, 1980 [7] R.F. Milsom and others, “Effect of mesa-shaping on spurious modes in ZnO/Si bulk-wave composite resonators,” in Proc. IEEE Ultrason. Symp., pp. 498–503. [8] K.M. Lakin, “Modelling of thin film resonators and filters,” IEEE MTT-S Digest, vol. 1, pp. 149-152, 1992. [9] Tapani Makkonen, Member, IEEE, Antti Holappa, Juha Ell¨a, and Martti M. Salomaa, “Finite Element Simulations of Thin-Film Composite BAW Resonators,” in IEEE transactions on ultrasonics, ferroelectrics, and frequency control,vol.48,no.5,2007 [10] RanjanDey,D.Anumeha,” Design and Simulation of Portable Fuel Adulteration Detection Kit”,in Journal of Energy and Chemical Engineering, Vol.2Iss.2,PP.74-80,2014 MIPRO 2016/MEET Impact of Propagation Medium on Link Quality for Underwater and Underground Sensors G. Horvat, D. Vinko and J. Vlaović University of Osijek, Faculty of Electrical Engineering, Department of Communications Osijek, Croatia goran.horvat@etfos.hr Abstract - With the rapid development of wireless networks new application domains are continuously proposed. One of these applications includes an underwater and underground sensor which relies on the properties of diminished RF performance to send the gathered information to a base station. Although many researchers deal with the theoretical background of the channel and propagation models, very few experimental studies regarding the link quality were conducted. Due to the fact that new and more powerful sensor nodes are continuously being developed, the question that arises is how these new and more powerful classes of wireless sensors can handle the harsh environment in underwater and underground networks. Therefore, in this paper the authors analyze the impact of soil, water, moisture and other parameters on the link quality between two sensor nodes in various scenarios. The obtained results present a basis for designing and planning an underground/underwater sensor network. I. INTRODUCTION The performance of wireless sensor nodes (WSN) depends on the environment and the medium in which they operate. The underwater and underground environment significantly differ from the terrestrial environment where WSNs are commonly used. There is a number of papers proposing new protocols to cope with the propagation problems in both underwater [1] - [4] and underground [5] environment. If we compare underwater and underground environment, the research focuses on a different set of problems. In underwater applications, the focus is set on problems such as deployment algorithms for node placement [6], [7], powering and energy efficiency issues [8], [9]. In underground communications, the focus is mainly set on various applications in coal and metal mines [10] - [13], and underground train tunnels [14]. The security aspect [15], [16] and localization/tracking [17] is also being discussed but mainly for WSN applications in underground environments. When it comes to communication, researchers are exploring different approaches to extend the communication range in the underwater environment by using wireless sensor nodes with optical [18], [19] and acoustic communication [20], [21]. This is due to high attenuation of the electromagnetic (EM) wave [22] in water. Additional circuitry (optical and acoustic transducers) used to extend the communication range has a negative impact on power consumption and battery life of the sensor node. In both, the underground environment (such as coal mines and train tunnels) and terrestrial environment, a propagation medium is still air. Thus, standard EM wave propagation is used for communication. This paper focuses on sensor node performance when the node is transmitting EM waves through water or ground. There are some papers that deal with such channel modelling and there are few experimental studies in underwater [23] and underground [24] environment, but the link quality has not yet been thoroughly evaluated. Furthermore, as new sensor nodes are continuously being developed (higher RF sensitivity, higher RF power), the question that arises is how these new and more powerful classes of wireless sensors can handle the harsh environment in underwater and underground networks. Therefore, in this paper the following parameters are being analyzed in various scenarios: Received Signal Strength Indicator (RSSI), Link Quality Indicator (LQI), Round Trip Time (RTT), and Packet delivery probability (prr). The main objective of underground communication is to analyze the impact of communication distance and soil moisture on the monitored parameters. The goal of underwater communication is to analyze how the distance and additional impurities (salinity) affect the observed parameters. This paper is structured as follows: the measurement testbed for measuring underwater and underground communication is proposed in Section II. In Section III the measurement results for underground communication are presented, whereas in Section IV the measurement results of underwater communication are presented. Section V gives the conclusion of the proposed work and the guidelines for future work. II. MEASUREMENT TESTBED In order to examine the link quality parameters in the underwater and underground networks a testbed that consists of a point-to-point WSN and a testing chamber was proposed. Used WSN topology is based on a star topology, with one network Coordinator and n End nodes which form a WSN. Hence, in the proposed topology the first node is a network Coordinator whereas the second node is an End node. The diagram of the proposed measurement testbed is shown in Fig. 1. This work was sponsored by the J. J. Strossmayer University of Osijek under the project IZIP-2014-104 "Wireless power transfer for underground and underwater sensors" MIPRO 2016/MEET 135 power and higher RX sensitivity. The characteristics of the used WSN node are shown in Table I. Figure 1. The measurement testbed for underwater and underground sensors The network Coordinator is connected to a personal computer (PC) via a USB cable, which is also used for powering the Coordinator node. The End node is a battery powered node. An application designed for testing and measurement purposes is situated on the PC that communicates with the Coordinator. Both nodes are housed in waterproof containers to prevent contamination by moisture. The distance between the node’s antennas and the test chamber wall is greater than the diameter of the 1st Fresnel zone (> 0.18 m); no relevant part of RF radiation will propagate outside the test chamber. One of the analyzed testbed parameters is the distance between the nodes which refers to the distance between node antennas. The nodes are assumed to be placed on the same height and both antennas are directed to each other; PIFA (PCB Inverted F Antenna) with a gain of 2.28 dBi. A block diagram of the WSN node is shown in Fig. 2. The proposed methodology for measuring the link quality is based on convergecasting properties of the WSN; multiple nodes transmit measured values to a single coordinator of the network, but taking into account round trip communication. In the proposed methodology the sensor node sends a packet towards the coordinator and waits for the reception of an acknowledgement packet. In this manner each sensor node has a feedback information whether the measured data reached the destination, what is the delay (Round Trip Time, RTT) of the packet and what is the packet delivery probability (prr). During the testing, the node will send 100 packets, each separated by a time delay of uniform distribution U (0.1 s, 1 s). After the packets are sent, the end node will analyze the data and transmit the link quality information to the coordinator. This methodology is proposed due to the fact that in harsh environments often End nodes lose communication with the Coordinator due to changes in environmental factors. With the proposed methodology a sensor node is aware of the link quality information and can therefore decide to transmit the data via RF or to store the data for future transmission (upon link quality improvement). In the course of the experiment two scenarios were analyzed: underwater and underground communication by means of filling the test chamber with soil or water. III. MEASUREMENT RESULTS – UNDERGROUND COMMUNICATION Figure 2. Block diagram of the used WSN node The node is composed of a microcontroller and a RF section which consists of a transceiver and a Power Amplifier / Low Noise Amplifier (PA/LNA). With the proposed architecture the used nodes have higher TX First scenario estimates link quality in underground communication in WSNs. The proposed methodology of the experiment consists of several steps. The first step consists of measuring the link quality for various distances in a constant moisture of soil for uplink and downlink communication. In this step the testing chamber was filled with commercial soil of constant soil moisture of 56.2 %. The approximate density of the soil was 300 kg/m3. Measurement was performed each 10 cm until the distance of 120 cm. The measurement results for RSSI value are shown in Fig. 3 TABLE I WSN NODE PARAMETERS 136 Parameter Value Frequency and modulation Transciever Microcontroller Communication stack Transmission power Receiver sensitivity Packet generation interval Number of packets sent Packet size macMaxCSMABackoffs macMaxFrameRetries 2.4 GHz, DSSS, 250 kBit/s AT86RF231 Atmel ATxmega256A3U Atmel LightWeight Mesh 20 dBm -105 dBm uniform (1s) 100 100B 4 3 Figure 3. RSSI for various distances vs. Downlink and Uplink channel (underground – soil moisture 56.2 %) As seen in Fig. 3 as the distance of nodes increases, the RSSI value decreases accordingly. The trend can be seen as a log distance model, widely known as log normal shadowing model. If the graph is shown as a log distance diagram, it can be seen that it is possible to determine the MIPRO 2016/MEET path loss exponent for underground communication [26]. First it is important to define the path loss of a communication link as: 𝑃𝐿[𝑑𝐵] = 𝑃𝑡 [𝑑𝐵] − 𝑃𝑟 [𝑑𝐵𝑚] + 𝐺𝑡 [𝑑𝐵] + 𝐺𝑟 [𝑑𝐵] (1) where PL refers to the path loss, Pt refers to the transmitted Power of the node (20 dBm), Pr refers to the received power (RSSI), and Gt and Gr refer to the antenna gain of the receiver and transceiver, respectively (2.21 dB). From (1) it is possible to calculate a linear regression line and determine the regression coefficients (Fig. 4). Figure 4. Path loss for logarithmic distance with linear regression (underground) It can be seen from Fig. 4 that there is high correlation with linear regression line and measurement data (coefficient of determination value of 0.96) vs. the logarithmic distance between nodes. As there are some variations from the linear model these differences can be accounted to multipath fading, reflections diffractions and similar radio irregularity phenomenon [25]. These variations can be modeled using a very widely used lognormal shadowing model: 𝑑 𝑃𝐿(𝑑) = 𝑃𝐿(𝑑0 ) + 10 ∙ 𝑛 ∙ log ( ) + 𝑋𝜎 𝑑0 (3) To determine the path loss exponent of the underground communication we correlate equations (3) and (2), which results in determining path loss exponent model as follows: 𝑃𝐿(𝑑) = 105.28𝑑𝐵 + 10 ∙ 5.496 ∙ log(𝑑) The next step in measurements includes the increasing of the soil moisture and measuring link quality parameters. Soil moisture was controlled by adding tap water, with quality parameters described onwards in Section IV. First measured link quality parameter is RSSI (Fig. 5) Figure 5. RSSI for various distances and soil moisture content (underground) In Fig. 5 the previous measurement containing 56 % of moisture was shown alongside with the increased soil moisture measurement of 61 %. It can be concluded that an increase in soil moisture of only 5 % drastically increases signal attenuation and increases the path loss. This can be seen in Fig. 6 where the RSSI value is shown for a distance of 50 cm in regards to soil moisture. (2) where PL is the path loss at referenced distance PL(d0), n is the path loss exponent that is dependent on the propagation medium and Xσ is the normally distributed random variable with zero mean and standard deviation σ. The derived linear regression equation from Fig. 4 is 𝑦 = 54.969 𝑥 + 105.28 (averaging 106.6 ms) without any direct correlation with the distance or the RSSI value. Also, the packet delivery probability averages 98.6 % with no direct correlation with RSSI or distance. It is important to notice that the used protocol stack uses packet retransmission on the MAC layer which improves the reliability of the communication and achieves high packet delivery probability. (4) Then, from the analysis of other parameters of the link quality analyzed in this step (LQI, RTT, prr), it can be concluded that the LQI ranges from 98.8 % to 100 % of the link quality and the RTT ranges from 99 ms to 118 ms Figure 6. RSSI vs. soil moisture content at distance of 50cm (underground) From Fig.6 it is evident that the increase in soil moisture above 70% will result in communication link failure at distances greater than 50 cm. This presents a very important factor that needs to be taken into consideration upon planning of an underground WSN. The influence of soil moisture and distance for round-trip-time (RTT) was analyzed next (Fig. 7). TABLE II UNDERGROUND EMPIRICAL PATH LOSS MODEL Parameter Path loss equation n PL(d0 = 1 m) Xσ MIPRO 2016/MEET Value 10·5.496·log(d) + 105.28 5.496 105.28 dB 2.6 dB Figure 7. Measurements of RTT vs. distance vs. soil moisture (underground) 137 From the performed measurements shown in Fig. 7 it can be concluded that the distance has no significant effect on the RTT parameter. However, soil moisture has a significant impact on the RTT with a negative trend, meaning that the increase in moisture (increase in RF attenuation) reduces average RTT for the sent packet. This can be accounted for the reduction of reflections and diffractions due to the increase in path loss of the medium. This can also be seen for higher moisture contents where RTT value settles at the minimum value (Fig. 8). measuring the link quality for various distances in water for uplink and downlink communication. In this step, the test chamber is filled with water (approximately 110 L) and the nodes are positioned so that the distance between the antenna and the edge of the chamber is greater than the size of the 1st Fresnel zone. Nodes with antennas are positioned in a hermetically sealed enclosure made from 0.1mm nylon material, with PVC support. Measurement was conducted using tap water at temperature of 15 °C with conductivity of 870 µS/cm, hardness of 295 mg CaCO3, °D 16.5 and TDS 460 mg/L [27]. This value for conductivity is very close to the limit for brackish water (1000 µS/cm). Measurement was conducted at various distances where the antennas were positioned in a straight line. Measurement results for Uplink and Downlink value of RSSI link quality parameter are shown in Fig. 10. Figure 8. Measurements of RTT vs. distance vs. soil moisture (underground) By increasing the soil moisture beyond the 65%, the communication is drastically affected and the packet delivery probability drops to only 9 % (91 % of the packets do not reach the destination – the Coordinator). In these conditions the RTT increases due to the increase in retransmissions (Fig. 9). Figure 9. RTT and prr for various soil moisture content (underground) From the measurement results it can be concluded that for underground communication the most influenced parameter is soil moisture. Soil moisture can significantly degrade link quality and cause communication link failure. On the other hand, by increasing soil moisture a drop in RTT was observed that could be used as a model for advanced communication protocols. In comparison with path propagation in air, at maximum distance (120cm) the RSSI value for LOS air propagation equals -47 dBm, RTT equals 71 ms and packet delivery probability equals 100%, representing perfect communication condition. This shows the amount of attenuation of RF propagation induced by the soil or similar propagation medium. IV. Figure 10. RSSI for various distances vs. Downlink and Uplink channel (underwater) As seen from measurement results Uplink and Downlink channel results are consistent, with little deviation, resulting in an almost symmetrical communication link. On the other hand, the path loss in underwater communication exhibits large amounts of attenuation which results in a maximum communication distance of only 15 cm. This can be accounted for large value of water conductivity, CaCO3 and TDS. On the other hand, other link quality parameters are not significantly affected by the change in distance, resulting in mean value of packet delivery probability of 98.3%, mean value of RTT of 115.8 ms and average LQI of 254.5, not having any correlation with RSSI or distance between the nodes. On the other hand, as water quality parameters have a profound impact on the communication and link quality, the second step consists of adding impurities (salinity) in order to increase the conductivity of water. Salinity was increased up to 3 ‰ and RSSI was measured at the distance of 10 cm between the End node and the Coordinator. Results are shown in Fig. 11. MEASUREMENT RESULTS – UNDERWATER COMMUNICATION A second scenario involves the analysis of underwater communication in WSNs considering link quality parameters. The proposed methodology of the experiment consists of several steps. The first step consists of 138 Figure 11. RSSI for different values of salinity at 10cm distance (underwater) MIPRO 2016/MEET From the measurement results it can be concluded that the increase in salinity drastically affects the RSSI value; increases the path loss in the channel. However, other link quality parameters are not significantly affected by the change in distance; not having any correlation with salinity concentration. V. CONCLUSION This paper presents a study on the impact of propagation medium on link quality parameters for underwater and underground Wireless Sensor Networks. As the propagation medium can be versatile, this paper analyzes the influence of soil moisture on the propagation in underground networks and impurity levels in underwater communication. Measurements were performed with the proposed testbed composed of two WSN nodes; a Coordinator node and an End node. Nodes communicate within 2.4GHz band with TX power of 20 dBm, receiver sensitivity -105 dBm and PIFA antenna with gain of 2.21 dBi. The testbed is composed of a testing chamber filled with medium (soil/water) and a PC used to analyze the communication. Link quality parameters that were analyzed within this paper are RSSI, LQI, RTT and prr. Underground communication: By performing detailed measurements for various distances an empirical propagation model is proposed where the path loss exponent n and the path loss were calculated. From the measurement results it can be concluded that soil moisture has a profound effect on the link quality where the increase in soil moisture can reduce communication range by a factor of two, for only 10 % increase in soil moisture. An interesting phenomenon was observed whereby increasing soil moisture the communication delay RTT was reduced. Underwater communication: Similarly to underground communication an analysis was performed for various distances, regarding link quality parameters. Tap water was used with described water quality parameters. It can be concluded that the path loss in water with relatively high conductivity (850 µS/cm) is significant, resulting in a maximum communication distance of only 15 cm. As the content of impurities significantly affects communication, the next step involved adding impurities in order to increase the conductivity – salt. The increase in water salinity drastically affects the RSSI value; increases the path loss in the channel. However, other link quality parameters are not significantly affected by the change in distance; not having any correlation with salinity concentration. When salinity is 3 ‰, the maximum communication distance drops to 10 cm. Future work on the topic of underground communication targets calculating the path loss exponent for various soil moisture contents and various soil densities in order to present an empirical path-loss model for underground communication. Also, as the soil moisture affects RSSI, soil composition such as the presence of salts, or ions in the soil will change soil electromagnetic properties, affecting radio propagation. These properties will be investigated in future work. Furthermore, similar approach will be taken in underwater communication where the path-loss model will be proposed based on different water quality factors (conductivity, salinity, TDS MIPRO 2016/MEET etc.) and isolating the most influenced factor for underwater communication. Also, the measurement data will be compared with theoretical expectations from the available literature – existing propagation models. ACKNOWLEDGMENT The authors would like to thank Krunoslav Aladić, PhD from Croatian veterinary institute, Branch - Veterinary department Vinkovci for the analysis of soil moisture content. REFERENCES [1] Umar, A.; Akbar, M.; Iqbal, Z.; Khan, Z.A.; Qasim, U.; Javaid, N., "Cooperative partner nodes selection criteria for cooperative routing in underwater WSNs," in Information Technology: Towards New Smart World (NSITNSW), 2015 5th National Symposium on , vol., no., pp.1-7, 17-19 Feb. 2015 [2] Fahim, H.; Javaid, N.; Qasim, U.; Khan, Z.A.; Javed, S.; Hayat, A.; Iqbal, Z.; Rehman, G., "Interference and Bandwidth Aware Depth Based Routing Protocols in Underwater WSNs," in Innovative Mobile and Internet Services in Ubiquitous Computing (IMIS), 2015 9th International Conference on , vol., no., pp.78-85, 8-10 July 2015 [3] Liaqat, T.; Javaid, N.; Ali, S.M.; Imran, M.; Alnuem, M., "DepthBased Energy-Balanced Hybrid Routing Protocol for Underwater WSNs," in Network-Based Information Systems (NBiS), 2015 18th International Conference on , vol., no., pp.20-25, 2-4 Sept. 2015 [4] Shah, M.; Javaid, N.; Imran, M.; Guizani, M.; Khan, Z.A.; Qasim, U., "Interference Aware Inverse EEDBR protocol for Underwater WSNs," in Wireless Communications and Mobile Computing Conference (IWCMC), 2015 International , vol., no., pp.739-744, 24-28 Aug. 2015 [5] Zhiping Zheng; Shengbo Hu, "Research challenges involving crosslayered communication protocol design for underground WSNS," in Anti-counterfeiting, Security and Identification, 2008. ASID 2008. 2nd International Conference on , vol., no., pp.120-123, 2023 Aug. 2008 [6] Felamban, M.; Shihada, B.; Jamshaid, K., "Optimal Node Placement in Underwater Wireless Sensor Networks," in Advanced Information Networking and Applications (AINA), 2013 IEEE 27th International Conference on , vol., no., pp.492-499, 25-28 March 2013 [7] Khalfallah, Z.; Fajjariz, I.; Aitsaadiz, N.; Langar, R.; Pujolle, G., "2D-UBDA: A novel 2-Dimensional underwater WSN barrier deployment algorithm," in IFIP Networking Conference (IFIP Networking), 2015 , vol., no., pp.1-8, 20-22 May 2015 [8] Amruta, M.K.; Satish, M.T., "Solar powered water quality monitoring system using wireless sensor network," in Automation, Computing, Communication, Control and Compressed Sensing (iMac4s), 2013 International Multi-Conference on , vol., no., pp.281-285, 22-23 March 2013 [9] Parmar, J.K.; Mehta, M., "A cross layered approach to improve energy efficiency of underwater wireless sensor network," in Computational Intelligence and Computing Research (ICCIC), 2014 IEEE International Conference on , vol., no., pp.1-10, 18-20 Dec. 2014 [10] Dohare, Y.S.; Maity, T.; Paul, P.S.; Das, P.S., "Design of surveillance and safety system for underground coal mines based on low power WSN," in Signal Propagation and Computer Technology (ICSPCT), 2014 International Conference on , vol., no., pp.116-119, 12-13 July 2014 [11] Jin-ling Song; Heng-wei Gao; Yu-jun Song, "Research on Transceiver System of WSN Based on V-MIMO Underground Coal Mines," in Communications and Mobile Computing (CMC), 2010 International Conference on , vol.2, no., pp.374-378, 12-14 April 2010 [12] Longsheng Liu; Yue Li; Zhijun Zhang; Zhenghe Feng; Wenming Li; Da Zhang, "Experiment on underground propagation characteristic using CC110-based WSN," in Antennas and Propagation Society International Symposium (APSURSI), 2013 IEEE , vol., no., pp.1922-1923, 7-13 July 2013 139 [13] Xu Huping; Wu Jian, "Metal mine underground safety monitoring system based on WSN," in Networking, Sensing and Control (ICNSC), 2012 9th IEEE International Conference on , vol., no., pp.244-249, 11-14 April 2012 [14] Cammarano, A.; Spenza, D.; Petrioli, C., "Energy-harvesting WSNs for structural health monitoring of underground train tunnels," in Computer Communications Workshops (INFOCOM WKSHPS), 2013 IEEE Conference on , vol., no., pp.75-76, 14-19 April 2013 [15] Guofang Dong; Bin Yang; Yang Ping; Wenbo Shi, "A secret handshake scheme for mobile-hierarchy architecture based underground emergency response system," in Advanced Communication Technology (ICACT), 2015 17th International Conference on , vol., no., pp.54-58, 1-3 July 2015 [16] Li Rong, "A study of the security monitoring system in coal mine underground based on WSN," in Communication Software and Networks (ICCSN), 2011 IEEE 3rd International Conference on , vol., no., pp.91-93, 27-29 May 2011 [17] Li Zhang; Xunbo Li; Liang Chen; Sijia Yu; Ningcong Xiao, "Localization system of underground mine trackless facilities based on Wireless Sensor Networks," in Mechatronics and Automation, 2008. ICMA 2008. IEEE International Conference on , vol., no., pp.347-351, 5-8 Aug. 2008 [18] Anguita, D.; Brizzolara, D.; Ghio, A.; Parodi, G., "Smart Plankton: a Nature Inspired Underwater Wireless Sensor Network," in Natural Computation, 2008. ICNC '08. Fourth International Conference on , vol.7, no., pp.701-705, 18-20 Oct. 2008 [19] Ghelardoni, L.; Ghio, A.; Anguita, D., "Smart underwater wireless sensor networks," in Electrical & Electronics Engineers in Israel (IEEEI), 2012 IEEE 27th Convention of , vol., no., pp.1-5, 14-17 Nov. 2012 [20] Misra, S.; Ghosh, A., "The effects of variable sound speed on localization in Underwater Sensor Networks," in Australasian Telecommunication Networks and Applications Conference (ATNAC), 2011 , vol., no., pp.1-4, 9-11 Nov. 2011 140 [21] Yong-sheng Yan; Hai-yan Wang; Xiao-hong Shen; Fu-zhou Yang; Zhao Chen, "Efficient convex optimization method for underwater passive source localization based on RSS with WSN," in Signal Processing, Communication and Computing (ICSPCC), 2012 IEEE International Conference on , vol., no., pp.171-174, 12-15 Aug. 2012 [22] Uribe, C.; Grote, W., "Radio Communication Model for Underwater WSN," in New Technologies, Mobility and Security (NTMS), 2009 3rd International Conference on , vol., no., pp.1-5, 20-23 Dec. 2009 [23] Abdou, A.A.; Shaw, A.; Mason, A.; Al-Shamma'a, A.; Cullen, J.; Wylie, S., "Electromagnetic (EM) wave propagation for the development of an underwater Wireless Sensor Network (WSN)," in Sensors, 2011 IEEE , vol., no., pp.1571-1574, 28-31 Oct. 2011 [24] Stuntebeck, E.P.; Pompili, D.; Melodia, T., "Wireless underground sensor networks using commodity terrestrial motes," in Wireless Mesh Networks, 2006. WiMesh 2006. 2nd IEEE Workshop on , vol., no., pp.112-114, 25-28 Sept. 2006 [25] G. Horvat, D. Šoštarić and D. Žagar, "Using radio irregularity for vehicle detection in adaptive roadway lighting," MIPRO, 2012 Proceedings of the 35th International Convention, Opatija, 2012, pp. 748-753. [26] A. Alsayyari, I. Kostanic and C. E. Otero, "An empirical path loss model for Wireless Sensor Network deployment in an artificial turf environment," Networking, Sensing and Control (ICNSC), 2014 IEEE 11th International Conference on, Miami, FL, 2014, pp. 637642 [27] Dadić, Ž., PREGLED KVALITETE PITKE VODE U HRVATSKOJ, Priručnik o temeljnoj kakvoći vode u Hrvatskoj, WaterLine, Quality water systems. MIPRO 2016/MEET Electrical Field Intensity Model on the Surface of Human Body for Localization of Wireless Endoscopy Pill Bojan Lukovac, Ana Koren, Antonija Marinčić, Dina Šimunić University of Zagreb, Faculty of Electrical Engineering and Computing Unska 3, 10000 Zagreb, Croatia E-mail(s): bojan.lukovac@ericsson.com, {ana.koren, antonija.marincic, dina.simunic}@fer.hr Abstract—Constant struggle in medical science to develop new, more advanced methods of medicine application and reducing the invasiveness of a range of tests and procedures is present. For achieving this ambitious goal, it is necessary to find and develop new methods and technologies and to applicate them. The issue of localizing a range of devices inside the human body is one of them. S olution to this problem would enable the upsurge of nanorobots (nanobots) for micro-operations, application of strong medicine on limited areas to lessen its side effects, conducting the nanotoxicology reports or reducing the discomfort of some invasive procedures like endoscopy. In this paper, the focus is on the latter and an example of an endoscopy with a pill is given. Due to maximizing the effectiveness of endoscopy by the use of the pill there is a need for optimizing the technology used for its localization. S EMCAD tool is used in modelling of the scenario and executing the simulations. Keywords— WBAN; Electrical Field; Dipole; Endoscopy; Localization; SEMCAD potential problems is found, difficulty of deducing the exact location in the digestive tract presents itself, using solely the images surrounding it. Thus, the process is not precise enough to use in any operative procedures and it is usually necessary to repeat endoscopy with classical methods in order to confirm the diagnosis. Localization of the endoscopy pill implies determining its location, whether in real-time or retroactively, in respect to a referent point. It is expected for the solution to be portable, and not requiring the hospitalization of the patient for the time of procedure (time necessary for the pill to travel the digestive tract is approximately 12 hours). Furthermore, human body consists of bones and tissue which are arranged unevenly. Tissue has different density in different locations and is heterogeneous, which is why the issue of precise localization inside the human body remains open. In the next chapter (Chapter 2), the explanation of the selected antenna model is given, followed by the simulation and results of the model using SEMCAD (Chapter 3). Finally, conclusions are drawn in the Chapter 4. I. INT RODUCT ION Medical field is in a constant search for new, more advanced methods of medicine application and reduction of the invasiveness of a range of tests and procedures , one of which is endoscopy. Endoscopy is a procedure used for viewing the internal organs and vessels of a patient’s body without making incisions. It is a common and powerful diagnostic tool for digestive diseases . Classical endoscopy is described as an extremely unpleasant and painful procedure which doesn’t allow the exploration of parts of digestive tract (i.e. small intestine). The endoscopy, in which a pill is swallowed by the patient, travels through digestive system and documents it by taking photographs already exists, however it hasn’t replaced the classical method. This is due to it being less exact as the localization of the pill (and any device in general) inside of human body is not precise enough due to several challenges. Current procedure of diagnosis via endoscopy pill consists of the doctor manually viewing all the images provided by the pill, which can sometimes be u p to 100.000 images, depending on the manufacturer. Currently there is no automatic filtering of non-relevant images. When the image where there’s an indication of MIPRO 2016/MEET II. PROPOSED A NT ENNA M ODEL Endoscopic capsules are medical diagnostic devices 32 mm long and 11 mm in diameter consisting of a camera module, battery, antenna and transmitter along with additional sensors that depend on the manufacturer. They were developed in order to reduce the discomfort inherent to traditional endoscopic procedures and allow for detection of medical issues of the otherwise unobservable small intestine. Design of the antenna is an iterative process which demands multiple calculations with the goal of optimizing all the relevant parameters for working on the frequency specified. The antenna contained in the pill used in simulations is modelled in SEMCAD tool as a simple dipole antenna, consisting of two cylindrical parts with diameter of 1 mm, length of 10 mm and spacing of 1 mm. Antenna in the simulations is assumed to be a perfect electric conductor (PEC) in order to simplify the experiment. As shown in Figure 1, the antenna is sheathed with dielectric cylinder to protect it from contact with the surrounding tissue. Material used for the cylinder is Rogers 141 TMM-10 substrate which has electrical conductivity of 0.00087 S/m, relative permittivity 9.2 and density 1000 kg/m3). a) Figure 1. Isometric view of the antenna modelled in SEMCAD Model of the pill is placed onto a several locations in the body, though the majority of the measurements are executed on one of the three locations. b) III. SIMULAT ION AND RESULT S For the simulations, the software SEMCAD (Sim4Life) has been used, namely its most detailed model of the human body, Duke – a 36 year old man. First location of the antenna is in the large intestine in position surrounded by the pelvic bones (Figure 2a). Second location is the lumen of small intestine in the proximity of navel, on the front end part of the abdomen, which is a location that enables us to observe the difference in attenuation between front part of the body (with just a thin layer of muscle, fat and skin) and the back part (where attenuation is caused by intestine, organs, spine, muscles, fat and skin). This is shown in Figure 2b. Final location measured is the middle of the stomach lumen, which is characterized by the presence of the rib cage and multiple organs which significantly influence the strength of the signal (Figure 2c). These locations are selected as they represent well the various conditions inside the human’s body. Furthermore, in these three locations, measurements have been performed with harmonic signal and Gauss impulse; the pill has been rotated parallel to all three axes and; the measurements in the space diagonal have been performed. The intensity of the electrical field and Poynting vector has been measured. Poynting vector was selected as an additional method due to practical reasons. 142 c) Figure 2. Locations of the antenna marked with a red circle: pelvic bones (2a), lumen of small intestine, (2b) and middle of stomach lumen (2c) Expected diagram of dipole antenna radiation in free space is shown in Figure 3. Areas in the red specter present lower power while the green areas present higher relative power. All the simulations have been performed using the antenna transmitting at 2.4 GHz frequency, and the source characteristics being 5 V voltage and internal resistance 50 Ω. MIPRO 2016/MEET b) Figure 5. Electrical field intensity (front and side view) Figure 2. Diagram of dipole radiation in free space As an additional method, Poynting vector was measured. Results for the same scenario are given in Figures 5c and 5d below. Following figures (Figure 4a and 4b) show the legend for the following simulations (Figures 5 through 7). a) b) Figure 4. Reference values of electrical field intensity (left) and Poynting vector value (right; legend applicable for all simulations) a) Furthermore, the radiation of the dipole can be observed on the following figures, as well as the position (localization) of the pill. In Figures 5a and 5b it is clear that the intensity of the electrical field suffers from strong attenuation caused by the pelvis bones. b) Figure 5. Poynting vector value in the first scenario (front and side view) a) MIPRO 2016/MEET This leads to the following conclusions. First, electrical field intensity is less indicative than the value of Poynting vector due to lesser loss of intensity inside the body. Secondly, the area in the pelvis region strongly affects the radiation of dipole antenna; pelvis bones cause reflection and attenuation which causes the diagram of radiation starts to look almost isotropic. The results of the second simulation scenario are presented in Figure 6. 143 the antenna is located. The importance of this result is great since the recognition of the expected value potentially enables the assessment of the pill’s orientation. Figure 6. Poynting vector value in the second scenario (side view) Again, several conclusions can be drawn. The proximity of the radiation source to skin causes propagation of electromagnetic wave and distorts the radiation diagram. Organs, bones and tissue cause the expected loss of wave strength in all other directions. The surroundings have much larger influence on the radiation diagram than the orientation of the pill. Thus, precise localization of the pill would require an algorithm which would define measurements on a large number of locations in a body so it would be possible to eliminate the influence of the wave propagation in the skin. Results of the third and last scenario are given below in Figures 7a and 7b. Figure 8. Radiation diagram distortion on the skin surface Closer analysis of the radiation gives an insight into specific interactivity of the wave and skin and is shown on Figure 8. On the area, marked with a red circle, it is clearly visible how weak the signal is in the area between the pill and skin (in line with the expectations) but then greatly increases through the skin surface, due to refractive index between air and skin. Thus, it would be advisable to perform the measurements at approximately 1 cm distance from the skin surface. IV. CONCLUSION a) b) Figure 7. Electrical field intensity and Poynting vector value in the third scenario The stomach has proved to be the best location for all tests of dipole antenna since the result received has the expected doughnut diagram of dipole antenna radiation, most likely due to the homogenic nature of the stomach lumen where 144 The issue of localizing a range of devices inside the human body is extremely relevant in today’s medical science as the solution to this problem would enable the upsurge of nanorobots (nanobots) for micro-operations, application of strong medicine on limited areas to lessen its side effects, conducting the nanotoxicology reports or reducing the discomfort of some invasive procedures, endoscopy included. Several conclusions were drawn from the past simulations in SEMCAD. The greatest impacts on the propagation of electromagnetic wave inside human body have transitions between bones and other tissue and transitions between skin and air. Low electrical conduction and density of the bones has the largest effect on the wave propagation having in mind the ratio of bone volume in the total body volume. Choice of antenna is of crucial importance. The Rogers TMM-10 substrate cylinder sheathed dipole antenna gives realistic radiation diagrams. In some of the simulated scenarios (locations) the parameters of the environment (surrounding area) impacts the radiation diagram severely, to the point where it is impossible to draw conclusions of the pill position or orientation. Thus, it is necessary to develop algorithms which would take into account the environment and eliminate its effects (e.g. wave propagation through skin) in MIPRO 2016/MEET order to ensure precision. Furthermore, measurements should be conducted without skin contact. Point of refractive index between skin and air deforms radiation diagram and makes the assessment of the position and orientation of the antenna more difficult. Hence, it is advisable to move the measuring instrument at least 1 cm away from the skin surface to avoid diagram deformation. Lastly, graphs of electrical field intensity and Poynting vector both describe similar radiation diagrams. However, due to, by far, easier and more practical measuring of the received power (which is directly proportional to the Poynting vector) using a sensor in comparison with measuring the electrical field intensity it is advisable to use measurements of Poynting vector (i.e. received power) for calculations. REFERENCES [1] Penazzio M. et al., Small-Bowel Capsule Endoscopy and DeviceAssisted Enteroscopy for Diagnosis and T reatment of Small-Bowel Disorders: European Society of Gastrointestinal Endoscopy (ESGE) Clinical Guideline, Endoscopy 2015 nr. 47, 2015., pp . 352.-376. [2] Ladas, S.D: et al. European Society of Gastrointestinal Endoscopy (ESGE): Recommendations on Clinical Use of Video Capsule Endoscopy to Investigate Small-Bowel, Esophageal and Colonic Diseases, Endoscopy 2010 nr. 42, 2009., pp. 220.-227. [3] Woods, S.P., Constadinou, T .G., Wireless Capsule Endoscope for T argeted Drug Delivery: Mechanics and Design Considerations, IEEE T ransactions on Biomedical Engineering, Vol. 60, nr. 4, April 2013., pp. 945. [4] Yuce, M., Dissanayake, T., Easy-to-Swallow Antenna and Propagation, IEEE Microwave Magazine, nr. 14, June 2013., pp. 74 -82. [5] Wen-cheng Wang et al., Experimental Studies on Human Body Communication Characteristics Based Upon Capacitive Coupling, Body Sensor Networks (BSN), 2011 International Conference on, Dallas, USA, May 2011., pp. 180 – 185. [6] Sazonov, E., Neuman, M. R., Wearable Sensors: Fundamentals, Implementation and Applications, Academic Press – Elsevier, San Diego, USA, 2014., pp. 462. [7] Francesco Merli et al., Design, Realization and Measurements of a Miniature Antenna for Implantable Wireless Communication Systems, IEEE T ransactions on Antennas and Propagation, Vol. 59, nr. 10, October 2011., pp. 3544-3555. [8] Vallejo, M., et al., Accurate Human Tissue Characterization for EnergyEfficient Wireless On-Body Communications, Sensors (Basel) Vol. 13., nr. 6., 2013., pp. 7546.-7569. MIPRO 2016/MEET 145 Wide band current transducers in power measurment methods - an overview Roman Malarić, Željko Martinović *, Martin Dadić, Petar Mostarac, Žarko Martinović ** Faculty of Electrical Engineering and Computing/ Department of Electrical Engineering, Fundamentals and Measurements Unska 3, 10000 Zagreb, Croatia *COMBIS / Baštijanova 52A, Zagreb, Croatia **Danieli Systec, Katuri 17, Labin, Croatia e-mail:, roman.malaric@fer.hr, zeljko.martinovic@combis.hr, martin.dadic@fer.hr, petar.mostarac@fer.hr z.martinovic@systec.danieli.com Precise power measurement is usually done at national metrology institutes. In recent years researchers have devoted substantial time toward a measurement of power under nonsinusoidal conditions, especially in power grids because of renewable energy increase. The challenge is not only to measure power at 50 Hz, but also to measure harmonics and inter harmonics. To measure power at needed accuracy, accurate and frequency independent current and voltage transducers are needed up to 100 kHz. In this paper an overview and state of the art is given for current transducers used for this purpose. This mainly includes AC shunts of coaxial design, calculable resistors, precise transformers and methods for their characterization. I. INTRODUCTION In order to transform the currents to voltages suitable for analog to digital conversion (usually 1 V RMS) to obtain power and energy, precise current shunts or shunt/transformer combination are used for this purpose. This, in particular, is important for the measurement of power and energy under distorted waveform conditions. Current shunts therefor represent one of the most important parts of power measuring system (fig.1.). U/I source Shunts Dividers A/D conversion Algorithms Calculation Fig. 1. Precise power measurement system II. CURRENT SHUNTS Coaxial shunts have a better frequency behavior than shunts which are not coaxially connected [1], and smaller AC-DC current transfer differences up to 30 kHz or even 100 kHz. Shunt design have been mostly based on work done at Dimitry Ivanovich (VNIIM) but there have been few exceptions and different designs. The most often used resistors are low inductance metal foil type. NRC [2] shunts for currents up to 200 mA are a star configuration of resistors soldered on a single-sided printed circuit board. For the current ranges from 0.5 to 10 A, the shunts are Mendeleev 146 type built of three plates connected by a number of ribs. The plates and the ribs are cut from the double-sided copper-clad boards. The shunt resistor consists of a number of Vishay type S102C metal film resistors, concentrically mounted between the output and the middle plates, and connected in parallel by the copper layers of these plates. The input plate is removed from the two output plates by the length of the rib switch minimizes the inductive coupling between the input and the output circuits. This design is similar to the SP type [3][4] which are designed from 50 mA to 100 A, CSIRO shunts [5] and SIQ shunts [6] which are coaxial type ranging from 100 mA to 20 A, with frequency range from 10 Hz to 30 kHz. The input and output sides are separated by 17-cmlong crossbars that are made of double-sided printed circuit boards. The number of crossbars and the diameter of the circular plates depend on the current shunt’s nominal current. CMI shunts [7] are also cage type designed to be well suited for use with planar multi-junction thermal converters (PMJTCs). The voltage drop across every shunt (30 mA to 10 A) in parallel with a 90 PMJTC is 1 V at nominal current. A calculable model was developed for these shunts using lumped circuit elements. Temperature coefficients were reduced by using a suitable combination of resistors types (S102C, S102K and Z201). New generation of shunts were produced based on results obtained from the analysis of the lumped element model of the original shunt [8]. At BEV [9] a manganine foil is wound as the resistive element around a cylindrical core made of glass fiber epoxy. One copper foil at the outside brings the low potential back to the current input, another copper foil underneath the manganine brings the high potential from the input to the output side. All foils are soldered onto copper discs which are mechanically connected to the core. This principle applies for shunts in the range from 2 A to 100 A, except that for shunts from 2 A to 10 A the core is made of a brass tube instead of copper foil. Shunts from 100 mA to 1 A is made with metal film resistors soldered in parallel in a 4-terminal arrangement with a small loop in both the current and potential path. The output voltage at nominal current is 0,4 V. INRIM [10] current shunts cover the range from 5 mA to 20 A with nominal voltage of 1 V to cover the input voltage of planar multi-junction thermal converter (PMJTC). The shunts with value from 5 mA to 100 mA have a single SMD resistor connected between the central MIPRO 2016/MEET conductor and a return wire at the external coaxial screen. Shunts up to 2 A are made with a disk of a double-sided printed circuit board with conductive layer removed from one side except for the edge, and on the other side, a ring is machined out of the conductive layer. The conductive parts and the external edges that are connecting the two sides are coated with gold, with 10 SMD resistors soldered across to the insulating ring. For ranges up to 20 A the disks are made of a thermally conductive ceramic material 10 mm thick with a diameter of 80 mm. The resistive parts are made with manganin foils that are fixed by thermally conductive adhesive resin. The foils are then patterned to the proper shape. At JV[11], a new type of shunt for ac–dc current transfer was developed for frequencies from 10 to 100 kHz and current ranges of 30 mA–10 A with uncertainties smaller than ±9 μA/A using surface-mount resistors of the cylindrical metal-electrode face-bonding (MELF) type which are cheap and easily available with temperature coefficients below 10 μΩ/Ω per K. The shunts have very low and calculable ac–dc current transfer differences in the frequency range of 10 Hz– 100 kHz. INTI [12] presented three different shunt designs for AC-DC current transfer which has been evaluated. The nominal current values are 5 A and 10 A with nominal output of 1 V. One of the shunt is similar to cage type design and two other differ from known designs, however the paper lacks the comparison results. Commercially available shunts include Fluke A40B series, Guildline 7340 and 7350 series, Transmille, and some of the already mentioned in this paper from National Metrology Institutes. A new AC current shunt was also designed at NIM that uses minalpha wires instead of commercial available resistors [13]. Shunt showed good results up to 100 kHz. A. Use of transformers to lower current level Sometimes current transformers are used to lower the currents to acceptable levels and are then burdened with low current shunts. INMETRO [14] developed a special twostage transformer, which is coupled to a standard shunt of 10 Ω. At NIM [15] a set of shunts is used for current ranges of 0.1, 0.2, 0.5, 1, 2, 5, 10, and 20 A, while a current transformer is adopted for 50 A. The full-scale output signals of the voltage dividers and the shunts are 0.8 V regardless of the voltage/current range, which are the input signals of the corresponding two DVMs. INTI [16] developed current transformers with 100 mA output value which are used together with 10 Ω AC shunt to obtain 1 V output. At PTB Primary AC power standard [17] current transformers together with AC shunt is used as well. INRIM [18] current to voltage converter is made up of a double-stage current transformer with a precision 1 Ω AC resistor connected to its output. B. High current shunts Recent developments have seen the realization of 100 A shunt with considerable uncertainty improvement so that a current transformer is not needed any more up to 100 A [19] that have AC-DC difference below 20µA/A up to 100 kHz or with [4] AC-DC difference of -36 μA/A up to 100 kHz. These are designed in similar manner like the lower value MIPRO 2016/MEET shunts using Mendeleev cage type. NRC high current shunts [20] have been produced in cooperation with BEV. Each shunt consists of three coaxial cylinders mounted between four copper plates on a glass fiber epoxy cylinder. The middle cylinder, the resistor of a shunt is manufactured from a thin manganine foil. The AC-DC difference is lower than 100 μA/A at 100 kHz. BEV Institute [21] also manufactures high current TEE connector needed for calibration of these shunts. There are also some commercial shunts like Guidline 7340 and Fluke A40B that are available up to 100 A. C. Calculable AC-DC resistors The AC-DC difference of shunts can be obtained using the modelling the shunts with lumped elements and calculating the frequency response and by calibrating the shunts with AC bridge which needs a calculable resistance standard or comparing it against thermal voltage transfer standards. Several types of the calculable ac/dc resistors have been presented in literature as coaxial, bifilar, quadrifilar and octofilar designs. The time constant for coaxial and bifilar resistors was analyzed [22]. Usually a simple geometrical structure is used. As for more recent research, the calculable coaxial resistor was designed [23] based on the coaxial line with a cylindrical shield which can be described by relatively simple equations for the real and imaginary parts of the impedance. The resistors consist of a straight even ohm resistance wire surrounded by a 51 mm diameter coaxial brass case with nominal value of 1000 Ω. However, the resistor was evaluated only at one frequency of 1592 Hz with good results. In an inter-comparison [24] of calibration systems for AC shunts up to audio frequencies (10 kHz) between the NRC, JEMIC, and NIST is presented. The comparison was implemented with a calculable transfer ac/dc shunt, designed by JEMIC at frequencies up to 10 kHz in each laboratory. Both laboratories use for that purpose its own calculable AC-DC resistors and AC bridges. The NRC uses two calculable ac/dc quadrifilar resistors of 100 and 1000 Ω described by Gibbons [25]. JEMIC [26] and NIST [27] compares a 0.1 Ω bifilar reference current shunt with calculable amplitude and phase response. A new design for calculable coaxial resistors [28] with the values of 1 kΩ, 10 kΩ and 12,906 kΩ have been modeled by using Mathcad computer program and evaluated in. Also, a new coaxial resistors [29] for use with quantum Hall effect-based impedance measurements was designed at PTB and CMI. The improvement compared with existing resistors is in easier handling of inner resistive wire from the shield. D. AC Shunt modelling Shunts differ from calculable AC-DC resistors as they are not so simple to model having many elements such as ribs, plates and resistors. Some of the shunts were just calibrated using thermal voltage converters, some have been evaluated using simple models, and some of the shunts have been modeled using detailed equivalent schemes. A calculable model [30] for Mendeleev type shunt was developed using lumped circuit elements which can be used to calculate trans impedance, ac–dc difference, and the phase angle error of a shuns. It is based only on calculations of all component 147 values from the geometry and material properties (except resistors). Compared with measured results the difference is less than 6 μΩ/Ω in the AC-DC difference and 110 μrad in the phase angle error at frequencies up to 100 kHz. SIQ shunts [31] model was made using lumped element modeling and then analytical transfer function and input impedance of the shunt were derived from the model and used to calculate frequency response and input impedance as a function of frequency. These calculations were then compared to the calibrated values of AC/DC difference and against measured input impedance. Simplified model of BEV foil shunts [32] have been presented. Also, the simplified circuit of INRIM shunts [10] have been presented but without mathematical modeling. NRC shunts [2] have been modeled assuming that shunt consists of several identical components. The input and output front plates were modeled as R−C−R T-networks, the ribs as (R + L−C−R + L) T-networks, and the shunt resistor as (R + L)C assuming that the shunt is a two terminal-pair device. The model, even though too imprecise for a theoretical characterization of a “calculable” shunt was used in selecting the length of the ribs. The longer ribs increase the distance between the input and the output, thus reducing the coupling between the two circuits, which is a major source of the shunt ac–dc difference. Shorter ribs, which were eventually used in the design with lower capacitance, decrease the internal capacitance of the shunt. The number of ribs and resistors is again a compromise as with increasing the number of ribs increases the nominal power rating of the shunt but also increases the internal capacitance. JV shunts [11] are modeled using lumped circuit elements. The model includes approximations to the parasitic reactive and resistive components from the chosen shunt geometry. Component values are measured and/or calculated from the geometry and material properties. Finally, important paper is the analysis of shunt TVC [33] combination which was done. This paper describes the relationship between the overall ac-dc differences of a shunt thermal-converter combination. comparing it with PMJTC 10 mA AC standard. The AC-DC difference was less than 50  for all frequencies and currents. The same method was also used to determine frequency characteristics of NRC shunts [20]. The AC-DC current difference for in-house built shunts from 100 mA to 10 A is less than 100  up to 100 kHz, but for commercial shunts from 30 A up to 100 A the AC-DC difference is quite larger particularly at frequencies above 10 kHz. JV shunts [11] were evaluated for AC-DC difference using PMJTC in a digital bridge setup for comparing the ac– dc current differences is similar to the system described by Rydler [35]. A complete step-up from 10 mA to 10 A was made where the unknown shunt/PMJTC for the next higher current is calibrated at the current level of the known converter and then used at its rated current. At 10 mA, the shunt/PMJTC was calibrated directly against the primary reference for the ac–dc current difference. The other steps were 30 mA, 100 mA, 300 mA, 1 A, 3 A, 5 A, and 10 A. The AC-DC difference was determined to be less than 20 μΩ/Ω for currents from 50 mA to 10 A and frequencies up to 100 kHz. As the frequency characteristic of a shunt–thermal converter combination depends on the frequency characteristic of the shunt [2] the frequency characteristic of the thermal converter, and a mutual inductance between the shunt and the thermal converter which can be neglected for this particular shunt design, and quit some simplifications. The AC-DC transfer difference of a shunt–thermal converter combination δi can be presented [32] as follows: 𝛿𝑖 ≈ 𝑅𝑇𝑉𝐶 𝑅𝑆 +𝑅𝑇𝑉𝐶 × 𝛿𝑉 + 𝑅𝑆 𝑅𝑆 +𝑅𝑇𝑉𝐶 × 𝛿𝐼 + 𝛿𝑅 (2) from where the AC-DC current difference for shunts can be calculated. In addition AC-DC difference of current shunts can be evaluated using different [36] AC bridges and resistors with calculable AC response. F. Phase angle error E. Measurement of AC-DC current difference of shunts An AC-DC difference is defined as: 𝛿𝐴𝐶−𝐷𝐶 = 𝑉𝐴𝐶 −𝑉𝐷𝐶 𝑉𝐷𝐶 (1) The AC-DC current transfer difference is the most important characteristics of AC shunt and can be measured in three ways: inter-comparison with reference shunt, calibrated with thermal voltage standards and by measuring it with precise AC bridge. For example, SIQ shunts have been compared to current shunts [6] with known AC-DC. The AC-DC difference was less than 20  for frequencies up to 30 kHz and currents up to 20 A. However to properly characterize current shunts it is necessary to compare them with thermal voltage converters which have a flat AC-DC characteristics up to 100 kHz. At CMI [34] step-up procedure is used to measure AC-DC current transfer difference from 1 mA up to 10 A in the frequency range 10 Hz – 100 kHz by 148 The phase angle error calculation from the model can be validated by measuring it, even though it is a difficult task as the shunts are designed to have very low phase angle error. Most of the commercial shunt do not even specify phase angle error. Measurement of shunt phase angle errors have been performed [37] in various ways in SP research center. Four terminal inductance are determined using an LCR-meter and calculable inductance standards and calculated phase angel errors are then verified at frequencies up to a few kHz by comparison with digital sampling wattmeter [38] for shunts from 0,05 A to 10 A. Later [39] it was extended 1 MHz using the phase comparator comprising of fast and accurate digitizers. The agreement between the different methods, for measurement of phase angle difference of two shunts, was within a few prad at 1500 Hz. Phase angle errors of shunts [40] with rated currents at 50 and 100 A at frequencies from 25 to 100 kHz have been determined using a three-branch binary inductive current divider to measure the phase angle errors of high current shunts against a phase angle reference standard with only one step. Later [41] it was extended to 200 kHz. A new method [42] was described to determine the MIPRO 2016/MEET phase angle errors of ac shunts by measuring the inductance and distributed capacitance. For this purpose several units have been developed: 1 Ω shunt of coaxial design as the time constant standard, a coaxial inductor with identical structure as the time constant standard and a four-terminal mutual inductor for the measurement of the inductance of the time constant standard. In [43] a method is proposed to determine phase angle of shunts using group of micro potentiometer resistors designed at CSIRO whose phase angle error can be described, in the first approximation with a formula. With the buildup process the phase angle can be determined for currents from 100 mA up to 20 A and frequencies from 40 Hz to 200 kHz. Level dependence of phase angle error method have been proposed [44] because in the build-up procedures, the unknown shunts for the next higher rated current are calibrated at the current level of the known shunts, and then it is used at its rated current. A wideband phase comparator has been developed at INRIM for high current shunts [45]. The two-input digital phase detector is realized with a precision wideband digitizer connected through a pair of symmetric active guarded transformers to the outputs of the shunts under comparison. The system is suitable for comparing shunts in a wide range of currents, from several hundred of mill ampere up to 100A, and frequencies ranging between 500Hz and 100 kHz. The system has been used for international comparison of current shunts [46] from 10 A to 100 A. G. DC characterization of shunts To properly characterize AC shunts and calculate measurement uncertainty contribution of shunts in power measurement, in addition to phase angle errors and AC-DC difference it is also necessary to evaluate shunt performance for power coefficient (level dependence), temperature coefficient and drift of DC resistance of shunt. Level dependence of step up procedure has been [47] evaluated. The step-up procedure is based on the assumption that the ac–dc difference of a shunt and thermal converter combination is the same at two current levels used in the stepup procedure and can produce systematic errors that are added in each step and can be considerable at the end of the chain. The authors argue that because of their design, it was not expected to find an appreciable low frequency level dependence of the shunts but in PMJTC. In this paper TEE connector used to connect two shunts has been analyzed as well. The level dependence of two different shunts have been also evaluated [48] in order to reduce the contribution of the current level dependence in the uncertainty budget. The temperature coefficients (TCR) of the shunts and drift [6] were evaluated. The temperature coefficient (TCR) was measured in a temperature chamber, where the temperature was set to 23 C, 18 C, and 28 C. This temperature change was large enough to determine the TCRs at the shunt’s working point. The TCRs proved to be linear in the 18 C– 28 C temperature range within the measurement uncertainty. The temperature rise due to the load current was calculated based on the resistor power coefficient specification. Drift was during several month periods by comparison with the known reference resistor with the direct current comparator MIPRO 2016/MEET resistance bridge. Measurements showed that this drift was fairly linear with time and that a linear drift slope could be calculated for each current shunt. Drift was measured from 2,8  to 13,8  per year for all shunts. Complete DC characterization of shunts in ppm level have been conducted [49]. The set of four working resistance standards (0,1 Ω 0,001 Ω) and Fluke 8508A reference multi-meter have been used for comparison purpose. The system established in CMI which also includes oil bath and thermostatic chamber was used to characterize foil and cage shunts up to 100 A. The second paper [50] gives full DC characterization of AC shunts in the range of 30 mA to 10 A. The characterization includes the long-term drift, the temperature coefficient, and the power coefficient, which all appear to have effects on the level of 1-10 μΩ/Ω. The shunt investigated was designed and built by JV [11]. In table 1, current and frequency ranges for all shunts manufactured at different national metrology institutes and companies are summarized: Table 1: Available current shunts Manufacturer SIQ NRC INRIM SP CMI JV CSIRO BEV VNIIM INTI Fluke Transmille Guildline Current range [A] 100 A – 100 A 1 mA–100 A 5 mA – 20 A 10 mA - 100 A 30 mA - 10 A 30 mA to 10 A 100 mA - 20 A 10 mA to 100 A 1 A – 10 A 5 A – 10 A 1 mA – 100 A 1 mA – 100 A 10 mA – 100 A III. CONCLUSION In this paper an overview of wide band current transducers is presented. All available papers presenting the available AC shunts and transformers are included, and commercial AC shunts are also mentioned. A special section has been added that presents the methods and procedures for the calibration and characterization of shunts and transformers in respect to their AC-DC difference, phase angle error and DC characterization. IV. ACKNOWLEDGEMENT In the name of our research group (Department of Electrical Engineering Fundamentals and Measurements from Faculty of Electrical Engineering and Computing, University of Zagreb, Croatia) I would like to thanks Croatian Science Foundation for their support and motivation. This work has been fully supported by Croatian Science Foundation under the project Metrological infrastructure for smart grid IP2014-09-8826. 149 LITERATURE [1] M. M. Halawa, Implementation and Verification of Building-up AC-DC Current Transfer Up to 10 A, International journal of electrical engineering & technology (IJEET), 2012, pp. 262 – 273 [2] P. S. Filipski, and M. Boecker, AC–DC Current Shunts and System for Extended Current and Frequency Ranges, IEEE Transactions on Instrumentation and Measurement, 2006, pp. 1222 – 1227 [3] S. Svensson and K. E. Rydler, A Measuring System for the Calibration of Power Analyzers, IEEE Transactions on Instrumentation and Measurement, 1995, pp. 316 – 317 [4] K. E. Rydler and V. Tarasso, Extending ac-dc current transfer measurement to 100 A, 100 kHz, Precision Electromagnetic Measurements Digest, 2008, pp. 28 - 29 [5] I. Budovsky, A micropotentiometer-based system for low voltage calibration of alternating voltage measurement standards, Precision Electromagnetic Measurements Digest, 1996, pp. 497 - 498 [6] B. Voljc, M. Lindic, and R. Lapu, Direct Measurement of AC Current by Measuring the Voltage Drop on the Coaxial Current Shunt, IEEE Transactions on Instrumentation and Measurement, 2009, pp. 863 – 867 [7] V. N. Zachovalova, AC-DC current transfer difference in CMI, Precision Electromagnetic Measurements Digest, 2008, pp. 362 – 363 [8] V. N. Zachovalova, M. Síra and P. Bednari, New generation of cagetype current shunts at CMI, 20th IMEKO TC4 International Symposium, 2014, pp. 59 – 64 [9] M. Garcocz, P. Scheibenreiter, W. Waldmann, and G. Heine, Expanding the measurement capability for AC-DC current transfer at BEV, Precision Electromagnetic Measurements Digest, 2004, pp. 461-462 [10] U. Pogliano, G. C. Bosco, and D. Serazio, Coaxial Shunts as AC–DC Transfer Standards of Current, Precision Electromagnetic Measurements Digest, 2008, pp. 30-31 [11] K. Lind, T. Sørsdal and H. Slinde, Design, Modeling, and Verification of High-Performance AC–DC Current Shunts From Inexpensive Components, IEEE Transactions on Instrumentation and Measurement, 2008, pp. 176 - 181 [12] L. D. Lillo, H. Laiz, E. Yasuda and R. García, Comparison of three different shunts design for AC-DC current transfer, 17th IMEKO TC4 International Symposium, 2004, [13] J. Zhang, X. Pan, H. Huang, L. Wang and D. Zhang, A coaxial AC shunt with calculable AC-DC difference, Precision Electromagnetic Measurements (CPEM), 2010, pp. 597 – 598 [14] A. M. R. Franco, E. Tóth, R. M. Debatin, and R. Prada, Development of a power analyze, 11th IMEKO TC-4 Symposium, Trends in electrical measurements and instrumentation, 2001, pp 1-5 [15] L. Zuliang, W. Lei, L. Min, L. Lijuan, and Z. Hao, Harmonic Power Standard at NIM and Its Compensation Algorithm , IEEE Transactions on Instrumentation and Measurement, 2009, pp. 180-187 [16] R. Carranza, S. Campos, A. Castruita, T. Nelson, A. Ribeiro, E. So, L. D. Lillo, A. Spaggiari, D.Slomovitz, D. Izquierdo, C. Faverio, H. Postigo, H. Díaz, H. Sanchez, J. Gonzalez and Á. Z. Triana, Precision Electromagnetic Measurements (CPEM 2014), pp. 302-303 [17] E. Mohns, G. Ramm, W.G.K. Ihlenfeld, L. Palafox and H. Moser, The PTB Primary Standard for Electrical AC Power , MAPAN - Journal of Metrology Society of India, 2009, pp. 15 - 19 [18] U. Pogliano, Use of Integrative Analog-to-Digital Converters for HighPrecision Measurement of Electrical Power, IEEE Transactions on Instrumentation and Measurement, 2002, pp. 1315 - 1318 150 [19] B. Voljc, M. Lindic, B. Pinter, M. Kokalj, Z. Svetik, and R. Lapuh, “Evaluation of a 100 A current shunt for the direct measurement of AC current, IEEE Transactions on Instrumentation and Measurement, 2013, pp. 1675–1680 [20] P. S. Filipski and M. Boecker, AC-DC current transfer standards and calibrations at NRC, Simposio de Metrología, Mexico, 2006 [21] M. Garcocz, P. Scheibenreiter, W. Waldmann, G. Heine, Expanding the measurement capability for AC-DC current transfer at BEV, Precision Electromagnetic Measurements Digest, 2004, pp. 461-462 [22] H.Fujiki, A.Domae and Y-Nakamura, Analysis of the time constant for the bifilar calculable ac/dc resistors, Precision Electromagnetic Measurements, 2002, pp. 342-343 [23] R. E. Elmquist, Calculable Coaxial Resistors for Precision Measurements, Instrumentation and Measurement Technology Conference, 1999, pp. 1468 - 1471 [24] E. So, D. Angelo, T. Tsuchiyama, T. Tadokoro, B. C. Waltrip, and T. L. Nelson, Intercomparison of Calibration Systems for AC Shunts Up to Audio Frequencies, IEEE Transactions on Instrumentation and Measurement, 2005, pp.507 – 511 [25] D. L. H. Gibbings, A design for resistors of calculable a.c./d.c. resistance ratio, Electrical Engineers, Proceedings of the Institution, 2010, pp. 335 - 347 [26] T. Tsuchiyama and T. Tadokoro, Development of a high precision AC standard shunt for AC power measurement, Precision Electromagnetic Measurements, 2002, pp. 254 - 255 [27] O. B. Laug, T. M. Souders, and B. C. Waltrip, A Four-Terminal Current Shunt With Calculable AC Response, National Institute of Standards and Technology, 2004, pp. 1-56 [28] Y. Giilmez, G. Gulmez, E. Turhan, T. Ozkan, M. Cinar, L. Sozen, A new design for calculable resistor, Precision Electromagnetic Measurements, 2002, pp. 348 – 349 [29] J. Kucera, E. Vollmer, J. Schurr and J. Bohacek, Calculable resistors of coaxial design, Measurement science and technology, 2009, [30] V. N. Zachovalova, On the Current Shunts Modeling, IEEE Transactions on Instrumentation and Measurement, 2014, pp. 1620-162 [31] B. Pinter, M. Lindic, B. Voljc, Z. Svetik, and R. Lapuh, Modeling of AC/DC current shunts, Precision Electromagnetic Measurements (CPEM), 2010, pp. 599–600. [32] M. Garcocz, P. Scheibenreiter, W. Waldmann, and G. Heine, Expanding the measurement capability for AC-DC current transfer at BEV, Precision Electromagnetic Measurements Digest, 2004, pp. 461–462 [33] J. R. Kinard, T. E. Lipe, and C. B. Childers, AC-DC difference relationships for current shunt and thermal converter combinations, Precision Electromagnetic Measurements, 1991, pp. 352–355 [34] V. N. Zachovlova, M. Sira and J. Streit, Current and frequency range exstension of AC-DC current transfer difference measurement system at CMI, Precision Electromagnetic Measurements (CPEM), 2010, pp. 605 – 606 [35] K. E. Rydler, High precision automated measuring system for AC–DC current transfer standards, IEEE Transactions on Instrumentation and Measurement, 1993, pp. 608–611 [36] E. So, D. Angelo, T. Tsuchiyama, T. Tadokoro, B. C. Waltrip, and T. L. Nelson, Intercomparison of Calibration Systems for AC Shunts Up to Audio Frequencies, IEEE Transactions on Instrumentation and Measurement, 2005, pp. 507 – 511 [37] K.E. Rydler and V. Tarasso, A method to determine the phase angle errors of an impedance meter, Precision Electromagnetic Measurements Digest, 2004, pp. 123-124 MIPRO 2016/MEET [38] S. Svensson, K.E. Rydler, and V. Tarasso, Improved model and phaseangle verification of current shunts for ac and power measurements Precision Electromagnetic Measurements Digest, 2004 p. 82-83 [39] K. E. Rydler, T. Bergsten and V. Tarasso, Determination of Phase Angle Errors of Current Shunts for Wideband Power Measurement, Precision Electromagnetic Measurements (CPEM), 2012, pp. 284-285 [40] X. Pan, J. Zhang, H. Shao, W. Liu, Y. Gu, X. Ma, B. Wang, Z. Lu, and D. Zhang, Measurement of the Phase Angle Errors of High Current Shunts at Frequencies up to 100 kHz, IEEE Transactions on Instrumentation and Measurement, 2013, pp. 1652 - 1657 [41] J. Zhang, X. Pan, W. Liu, Y. Gu, B. Wang, and D. Zhang, Determination of Equivalent Inductance of Current Shunts at Frequency Up to 200 kHz, IEEE Transactions on Instrumentation and Measurement, 2013, pp. 1664 – 1668 [42] X. Pan, J. Zhang, X. Ma, Y. Gu, W. Liu, B. Wang, Z. Lu, and D. Zhang, A Coaxial Time Constant Standard for the Determination of Phase Angle Errors of Current Shunts, IEEE Transactions on Instrumentation and Measurement, 2013, pp. 199-204 [43] I. Budovsky, Measurement of Phase Angle Errors of Precision Current Shunts in the Frequency Range From 40 Hz to 200 kHz, IEEE Transactions on Instrumentation and Measurement, 2007, pp. 284 – 288 [44] X. Pan, Q. Wang, J. Zhang, S. Zeng, W. Chen, Measurement of the level dependence in phase angle errors of the high current shunts, Precision Electromagnetic Measurements (CPEM), 2012, pp. 160-161 [45] U. Pogliano, B. Trinchera and D. Serazio, Wideband digital phase comparator for high current shunts, Precision Electromagnetic Measurements (CPEM), 2010, pp. 135-136 [46] G. C. Bosco, M. Garcocz, K. Lind, U. Pogliano, G. Rietveld, V. Tarasso, B. Voljc, and V. N. Zachovalova, Phase Comparison of High-Current Shunts up to 100 kHz, Precision Electromagnetic Measurements (CPEM), 2010, pp. 229-230 [47] J. D. de Aguilar, R. Caballero, and Y. A. Sanmamed, Realization and Validation of the 10 mA–100 A Current Standard at CEM, IEEE Transactions on Instrumentation and Measurement, 2014, pp. 1753 – 1759 [48] T. Funck and M. Klonz, Improved AC–DC Current Transfer Step-Up With New Current Shunts and Potential Driven Guarding, IEEE Transactions on Instrumentation and Measurement, 2007, pp. 361-364 [49]V. N. Zachovalova, M. Sira, J. Streit, and L. Indra, Measurement system for high current shunts DC characterization at CMI, Precision Electromagnetic Measurements (CPEM), 2010, pp. 607-608 [50] G. Rietveld, J. H. N. van der Beek, and E. Houtzager, DC Characterization of AC Current Shunts for Wideband Power Applications, IEEE Transactions on Instrumentation and Measurement, 2011, pp. 2191 – 2194 MIPRO 2016/MEET 151 Laboratory model for design and verification of synchronous generator excitation control algorithms S. Tusun*, I. Erceg* and I. Sirotić* * Faculty of Electrical Engineering and Computing/Department of Electric Machines, Drives and Automation, Zagreb, Croatia stjepan.tusun@fer.hr, igor.erceg@fer.hr, igor.sirotic@fer.hr Abstract - This paper presents a laboratory model of synchronous generator excitation system based on National Instruments cRIO real-time industrial controller. It was specially designed for development and verification of classical linear and modern nonlinear excitation control algorithms. Real-time Clarke and Park transformations were implemented on the FPGA for measurements of the generator load angle, voltages and currents in the generator dq-frame. Automatic voltage regulator (AVR) and power system stabilizer (PSS2A) were implemented and experimentally verified on an 83kVA synchronous generator. Tests of step changes for generator voltage and mechanical power references, and test of transmission line disconnection were conducted. Experimental results were compared to simulation results of the designed model in Matlab/Simulink. I. INTRODUCTION Electrical power system is one of the largest technical systems. Increasing power demands [1] and introduction of new technologies, mostly based on power electronic [2], are the main reasons for changes and growth of the power system. Most of the produced electric energy comes from the synchronous generators installed in hydro, thermal and nuclear power plants. Long distances between power plants and operation of synchronous generator near its capability limits can lead to poor damping of electromechanical oscillations among group of generators or between specific areas of the power system [3]. These oscillations reduce power transfer capability limits and can lead to collapse of the power system [4]. One way to enhance the damping is by controlling synchronous generators excitation. Modern excitation systems are equipped with automatic voltage regulator (AVR) and power system stabilizer (PSS). Design and tuning of PSS is accomplished via linear techniques around an operating point [5-6]. However, synchronous machine and power system are highly nonlinear and generator operating point can change significantly due to load and topology changes and large disturbances. Therefore, nonlinear excitation strategies were proposed, such as: adaptive and intelligent control, feedback linearization, Lyapunov theory and energy shaping based controllers. Nonlinear excitation control strategies mostly have complex control algorithms and require measurement or estimation of generator load angle and network parameters. Commercial available excitation systems are closed for 152 modifications of the control algorithms and have predefined control options which are based on standard excitation types [7]. Implementation and testing of new excitation control strategies on these systems are very limited which was the main motivation for development of a new and open digital control system for excitation control. Excitation control system DIRES21 was the first digital system developed on the Faculty of electrical engineering and computing in Zagreb (Department of electrical machines, drives and automation). It was based on four DSP ADMC300 processors [8]. The programming and configuration of control algorithms was accomplished by graphical oriented software development tool. There were more than 300 predefined blocks manually implemented via assembler language. After DIRES21, another system was developed based on Texas Instrument DSP TMS320F281 [9]. These two digital systems were specially designed for excitation control and special skills and programming knowledge was required for implementation of control algorithms. They had limited communication capabilities and there was no simple way for transferring large blocks of measurement data. Over time the need for new and standardized development environment, which allowed fast prototyping and testing trough graphical development environment, grew. In [10] first prototype of excitation control system based on National Instruments cRIO platform was experimentally tested. This paper is continuation of [10] and it presents a new laboratory model for design, verification and fast prototyping of excitation control algorithms in graphic based software. Classical automatic voltage regulator (AVR) and power system stabilizer (type PSS2A) were implemented on the digital system using standard LabView blocks and experimental tested were conducted to show the system performances. II. LABORATORY MODEL Laboratory model used for verification of the excitation control algorithms is depicted in Fig. 1. The laboratory model enables the verification of the control algorithms in various control regimes (reference voltage and mechanical power step changes, turning on and off transmission lines, short circuit experiments, etc.). MIPRO 2016/MEET III. DIGITAL CONTROL SYSTEM Digital control system, used for the control algorithm programming and data acquisition, is based on the National Instruments CompactRIO platform (cRIO-9014). cRIO is a combination of a real-time controller, reconfigurable IO modules (RIO) and FPGA module. I/O modules are used for data acquisition and analog to digital conversion (ADC). Generator currents and voltages are first measured by voltage and current transducers and then passed through analog anti-aliasing filters (Fig 1.). Figure 1. Laboratory model of 83kVA synchrnous generator Laboratory model is located at the Faculty of Electrical Engineering and Computing in Zagreb and consists of:  Salient synchronous generator mechanically coupled with two DC motors  Transformer and inductances that simulate two parallel transmission lines,  Thyristor rectifier for control of the DC machines armature current,  IGBT rectifier for the synchronous machine excitation,  Digital control systems for excitation control algorithms. Synchronous generator (83kVA) is governed by two DC motors (44kW each) which are connected in series. Parameters of synchronous generator and DC motors are given in Appendix A (Tab. III and IV). Inductors L1, L2 and L3 are used to represent parallel transmission lines. Disconnection of transmission line and three phase short circuits tests can be performed by operating circuit barkers Q2 and Q1. Transformer T1 is used for simulation of network voltage changes in range of ±10%. Connection of the T1 to the 10kV grid is accomplished by transformer T2. Parameters of inductors and transformers are given in Appendix A (Tab. V and VI). DC motors are powered from the thyristor rectifier. Protection functions and parameterization of the rectifier is done by standard procedures [11]. Built-in speed and torque controllers are configured as two stage controllers. In first stage when the generator is not connected to the grid speed controller maintains synchronous speed (near nominal). In second stage when the generator circuit barker is closed (the generator is connected to the grid) torque control is activated. Speed and torque references are set by Siemens S7-1200 PLC. Also, the PLC is used as communication interface between the tyristor rectifier and cRIO digital system that controls the generator excitation. cRIO program consists of microcontroller program loops and FPGA program loops. Microcontroller program loops are executed in real time and monitored by the realtime operating system (RT OS) while FPGA program loops are executed on dedicated hardware (Fig. 2). A. Microcontroller program loops Two program loops are periodically executed on microcontroller: communication loop and control loop (Fig. 2.) For the control and communication loop implementation, standard Labview library programming blocks were used. By using the standard blocks optimal code execution is guaranteed. Main task of the communication loop is the transfer of the measured data, control signals and user interface commands. The loop is executed periodically with the frequency of 10 Hz. Modbus/TCP communication protocol is used for communication with Siemens PLC S7-1200. Control loop is used for execution of the excitation control algorithms. Execution frequency of this loop is 500 Hz. Input measurements are converted by FPGA ADCs, scaled in the FPGA, and used for calculation of the output control value. B. FPGA program loops Three time critical program loops are implemented on FPGA module: counter loop, PWM loop and ADC measurement loop. Counter loop is used for the digital encoder impulse counting. Positive and negative edge of the encoder’s A and B impulses are counted. From counter value the generator rotor angle and speed is calculated. Control signal for the IGBT rectifier is generated in the PWM loop. Triangular carrier signal with frequency of 1250 Hz is used for the PWM signal generator. Generator excitation is powered from independent voltage source trough IGBT rectifier (Fig. 1.). Output voltage of the IGBT rectifier is controlled by PWM signal from digital control system. Figure 2. Structure of digital control system MIPRO 2016/MEET 153 Excitation control system consists of the AVR and PSS. The AVR control loop consists of an inner field current control loop and outer generator voltage control loop. Field current controller is proportional (P) and generator voltage controller is proportional integral (PI). Parameters of generator voltage and field current controller used for simulation and experiments are given in Tab. I. Figure 3. Strucutre of ADC measuremnt loop ADC loop is used for signal conditioning and calculation of the Clarke and Park transform (Fig. 3). This loop is executed at the frequency of 2500Hz and it is synchronized with the PWM loop. Signals from the ADCs are scaled to relative units (pu) and the Clarke transform is used for calculation of the αβ components:  𝑖𝛼𝛽 1 𝑖𝛼 2 = [𝑖 ] = [ 3 𝛽 0 −1 2 √3 2 −1 𝑖𝑎 2 𝑖𝑏 ] ] [ −√3 𝑖𝑐 2  Furthermore, the Park transform is used to calculate the generator voltage dq components: 𝑢𝑑 sin 𝜃 𝑢𝑑𝑞 = [𝑢 ] = [ 𝑞 cos 𝜃  − cos 𝜃 𝑢𝛼 ] [ ] sin 𝜃 𝑢𝛽  Generator load angle is then calculated from the voltage dq components [12]. Generator active and reactive power, effective values of the generator currents and voltages are also calculated from αβ components. These signals are then filtered by second order Butterworth filter. Additionally, the generator angular speed and load angle are filtered with 10Hz Notch filer to damp base mechanical frequency. These two digital filters are part of the standard LabView FPGA library. In the end, values of filtered signals are transferred to microcontroller memory and used as inputs in control loop. Generator speed, rotor angle and αβ components of the measured signals are transferred to the microcontroller shared memory. These values are then collected by GUI application which is executed on a PC. GUI application is used for transfer and storage of measurement data. Furthermore, GUI is used for references setup and user control of the digital control system. IV. CLASSICAL EXCITATION CONTROL SYSTEM Classical generator excitation control, synchronous generator [3-4], transmission lines and power system were modeled in Matlab/Simulink (Fig. 4.). PSS was modeled by transfer function blocks from Matab/Simulink as PSS2A type [7]. Main task of the PSS is the damping of synchronous generator electromechanical oscillations [3-4]. Parameters of the PSS are given in Tab. II V. EXPERIMENTAL VERIFICATION For experimental verification of the presented laboratory model AVR and PSS2A were implemented on the digital control system. Discrete PI controller was implemented using standard PID block from LabView [13]. To implement the PSS2A on the digital control system it was necessary to convert its transfer function from continuous to discrete time domain. This step was accomplished by use of bilinear transformation in Matlab/Simulink. To validate AVR and PSS2A following tests were performed on the laboratory model:  voltage reference change,  mechanical power change, and  disconnection of transmission line. For each experiment the generator terminal voltage, active and reactive power, load angle, angular speed and field current are shown (Fig. 5 to 8). A. Voltage reference change Voltage reference step change from 1.0 to 0.9 pu experiment was conducted for the case of the synchronous generator connected to the distribution network. The mechanical power reference of the generator was constant (0.5 pu). Also, simulation in Matlab/Simulink was TABLE I. PARAMETERS OF THE GENERATOR VOLTAGE AND FIELD CURRENT CONTROLER Controller P gain I gain Upp. lim. Low.lim. 10 20 4 0 3 - 3 0 Voltage [pu] Field curr. [pu] TABLE II. Figure 4. Matlab/Simulink model of classical excitation control sysem 154 PARAMETERS OF PSS2A Tw1,w3 T1 T2 T6 T7 T8 T9 1s 0.3 s 0.05 s 0s 1s 0.2 s 0.09 s Ks1 Ks2 Ks3 N M Upp. Lim. Low. Lim. 5 0.5 1 1 4 0.1 pu -0.1 pu MIPRO 2016/MEET Figure 5. Comparison of experimental (blue) and simulation (red dash) results for a voltage reference change for the system with AVR Figure 6. Comparison of experimental (blue) and simulation (red dash) results for a voltage reference change for the system with PSS2A performed with the same conditions for validation of the proposed laboratory model. As it can be seen from Figs. 5 and 6 there is no significant difference between simulation and experimental results, in transient and steady state. The greatest difference is in the signal of field current due to unmodeled hysteresis in simulation model. Oscillations in signal of the active power, angular speed and load angle were presented for the system with AVR (Fig. 5.). These oscillations were successfully damped by adding PSS2A signal to AVR input (Fig. 6.). Active power feedback loop was not used in the laboratory model so there were differences in pre and post steady state values of active power (Figs. 5. and 6.). B. Mechanical reference change The mechanical reference step change from 0.3 to 0.6 pu experiment was conducted for the case of the synchronous generator connected to the distribution MIPRO 2016/MEET network. The generator voltage reference was constant (1.0 pu). In Fig. 7 comparison of experimental results for the system with without and with PSS2A are shown. As it was expected, system with PSS2A better damps electromechanical oscillations than system with AVR (without PSS2A). Also, it is important to note that system with PSS2A has grater overshot in signal of generator voltage which is tradeoff for better oscillation damping. C. Disconnection of transmission line Experiment of disconnection of one of the transmission lines was conducted while the generator voltage reference (0.9 pu) and mechanical power reference (0.4 pu) were constant. Disconnection of the line (inductors L1 and L2) was done by circuit breaker Q2 (Fig.1.). As it can be seen from Fig. 8 system with PSS2A better damps electromechanical oscillations than the system with AVR (without PSS2A). 155 Figure 7. Experimental results for a mechanical power reference change for the system with AVR (blue) and system with PSS2A (red dash) Figure 8. Experimental results for discconection of one transmission line for the system with AVR (blue) and system with PSS2A (red dash) VI. CONCLUSION Laboratory model for design and verification of synchronous generator excitation control algorithms was presented. Proposed model can be efficiently used for development and testing of classical linear and modern nonlinear excitation control algorithms. Implementation of control algorithms on the digital control system is accomplished by use of well-known graphically based development software. Furthermore, the graphic based software reduces time for control law development and verification. Automatic voltage regulator and power system stabilizer were implemented and experimentally tested. Experimental tests of the generator voltage change, mechanical power change, and disconnection of transmission line showed that the PSS2A successfully damps electromechanical oscillations. Furthermore, 156 experimental results were compared to results from Matlab/Simulink model. Developed simulation model can be used for parameter estimation and testing of new control algorithms before prototyping on laboratory model. APPENDIX A. LABORATORY MODEL PARAMETERS TABLE III. NOMINAL DATA OF SYNCHRONOUS GENERATOR Voltage 400V Speed 600r/min Current 120A Power factor (cos ) 0.8 Power 83kVA Excitation voltage 100V Frequency1 50Hz Excitation current 11.8A TABLE IV. NOMINAL DATA OF DC MOTORS Voltage Current Power Speed 220V 192A 44,24kW 600r/min MIPRO 2016/MEET TABLE V. NOMINAL DATA OF INDUCTORS Inductor L1 L2 L3 Inductance 3,5mH 1,35mH 0,45mH Current 86A 180A 226A TABLE VI. Transformer Voltage Power uk NOMINAL DATA OF TRANSFORMERS T1 T2 380/380±10% 10/0,4kV 145kVA 1MVA 6% 6% REFERENCES [1] [2] [3] [4] A. A. Bayod-Rújula, “Future development of the electricity systems with distributed generation,” Energy, vol. 34, no. 3, pp. 377–383, Mar. 2009 F. Blaabjerg, Z. Chen, and S. B. Kjaer, “Power electronics as efficient interface in dispersed power generation systems,” Power Electronics, IEEE Transactions on, vol. 19, no. 5, pp. 1184–1194, 2004. P. Kundur, N. J. Balu, and M. G. Lauby, Power System Stability and Control, 1st edition. New York: McGraw-Hill, 1994. G. Andersson, P. Donalek, R. Farmer, N. Hatziargyriou, I. Kamwa, P. Kundur, N. Martins, J. Paserba, P. Pourbeik, J. Sanchez-Gasca, R. Schulz, A. Stankovic, C. Taylor, and V. Vittal, “Causes of the MIPRO 2016/MEET [5] [6] [7] [8] [9] [10] [11] [12] [13] 2003 major grid blackouts in North America and Europe, and recommended means to improve system dynamic performance,” IEEE Transactions on Power Systems, vol. 20, no. 4, pp. 1922– 1928, Nov. 2005. M. J. Gibbard, “Robust design of fixed-parameter power system stabilisers over a wide range of operating conditions,” Power Systems, IEEE Transactions on, vol. 6, no. 2, pp. 794–800, 1991. D. Sumina, N. Bulić, and M. Mišković, “Parameter tuning of power system stabilizer using eigenvalue sensitivity,” Electric Power Systems Research, vol. 81, no. 12, pp. 2171–2177, Dec. 2011. “IEEE Recommended Practice for Excitation System Models for Power System Stability Studies,” IEEE Std 421.5-2005 (Revision of IEEE Std 421.5-1992), pp. 0_1–85, 2006. T. Idžotić, D. Sumina, and I. Erceg, “DSP based excitation control system for synchronous generator,” presented at the EDPE 2005, 2005. D. Sumina, N. Bulić, and M. Mišković, “Application of a DSPBased Control System in a Course in Synchronous Machines and Excitation Systems,” International Journal of Electrical Engineering Education, vol. 49, no. 3, pp. 334–348, Jul. 2012. I. Erceg, S. Tusun, and G. Erceg, “The use of programmable automation controller in synchronous generator excitation system,” in IECON 2012 - 38th Annual Conference on IEEE Industrial Electronics Society, 2012, pp. 2055–2060. Siemens, Simoreg, DC-MASTER, “6RA70 Microprocessor-Based onverters from 6kW to 2500kW for Variable-Speed DC Drives”, 13th ed., Siemens, 11 2007. T. Stjepan, I. Erceg, and G. Erceg. "Synchronous generator load angle estimation using Parks transformation." 9. savjetovanje HRO CIGRE 2009. 2009. National, Instruments. (2009) PID and Fuzzy Logic Toolkit User Manual, aviliable at: www.ni.com/pdf/manuals/372192d.pdf 157 The European Project SolarDesign Illustrating the Role of Standardization in the Innovation System W. Brenner*, N. Adamovic* * Vienna University of Technology, Institute of Sensor and Actuator Systems, Vienna, Austria werner.brenner@tuwien.ac.at Abstract - The Framework Program 7 Project SolarDesign is focused on new photovoltaic (PV) integrated product solutions. To achieve this, new materials, flexible production - and business processes in PV powered product design and architecture had to be developed. Promising markets such as sustainable housing, temporary building structures, outdoor activities, electro-mobility, road lighting and mobile computing drive the demand for decentralized, attractive energy solutions. As products have to respect existing standards SolarDesign from the begin decided to proactively integrate standardization into the project’s efforts. The example PV driven streetlamp demonstrates the influence of standards on performance requirements of the innovative PV materials. I. INTRODUCTION Designers, architects and industrial manufacturers share a common interest in using Photovoltaics (PV) as a decentralized and sustainable source of energy in their designs. Indeed photovoltaic modules are the only viable renewable energy solution that can be integrated directly into objects such as devices, textiles and surfaces of buildings. Photovoltaics (PV) is widely recognized as one of the key technologies for the future energy supply. The cost reduction within the last 15 years is a result of the utilization of learning effects, expansion of production capacities, extensive automation and standardization efforts [1]. SolarDesign reflects the consortium’s commitment to actively contribute to a solid and sustainable standardization system that fosters added value for the European industry. II. SOLARDESIGN VISION By adjusting the compositions and spectral responses of the functional layers of Thin Film Photovoltaics not only the overall efficiency can be improved but also energy yields under diffuse light conditions or at higher operating temperatures are changeable. This uniqueness of CIGS had potential for exploitation. The scientific and technical objectives (STOs) of this project are to develop: STO1: A flexible scribing and printing technology that allows producing a given photovoltaic module according to specific design requirements “on-the-fly”. This flexible interconnection is applied on the solar foil (i.e. an endless solar cell) with a minimum width of 300 mm and allows curved solar cells and interconnection patterns with a minimum radius of 10 mm. STO2: Novel materials for the underlying flexible solar cell technology to extend the design related degrees of freedom and to optimize the materials used for integrative solar applications. STO3: Novel materials for satisfying design related requirements on solar module level (the part that is most visible for the beholder). Focus will be laid on materials for the electrical conducting front grid to allow a high design freedom of patterns and colour variations, as well as using of different novel encapsulants allowing custom designed optical appearance. STO4: A methodological toolbox to provide design rules for the best solar cell super-structure and module design layout for a given application by using numerical modelling and simulation. STO5: The following applications demonstrate the developed technologies: solar charging (cover for tablet PC and solar powered radio), solar powered light, solar powered sensor networks for detection of fire in forest, urban solar lighting (compact solar street lighting system) for Product Integrated Photovoltaics (PIPV), Building Integrated Photovoltaics (BIPV) and integration of PV in textile support. III. Figure 1. “Technical constraints” versus “design freedom” 158 STANDARDIZATION AND INNOVATION Driven by increased competition many countries and companies have started efforts focused on implementation of international standardization in an early phase of research and development (R&D) [9]. MIPRO 2016/MEET Extensive research on the economic evaluation of new technology development and international standardization has been conducted [10], [11] which evidences that R&D that takes into consideration standardization can enhance the efficiency of investment and stimulate the introduction of developed technologies and solutions. Accordingly, world wide economy and mass production are typically associated with standardization. Standardization meanwhile extended its scope towards the research community in order to ease market access of innovations and to provide interoperability between new and existing products, services and processes Companies and research institutions can profit from participation in a Technical Committee (TC), which enables them to benefit from the ability to internalize the external information resources and thus to increase innovation skills [12]. In this light, the positive effect of participation can foster an enhancement of existing – and a creation of new – competence and innovations. On the other hand, it evidences interest in new technologies and solutions, shows the willingness to co-operate with other prospective users and indicates the intention to enter innovative markets. Early access to information, the degree of the strategic influence on new projects and the influence on technological development can be affected by the participants’ traditional or developing positioning (modes of participation, roles overtaken, functions, social rank etc.) [13]. Standards enable companies and research institutions to comply with relevant laws and regulations and help to ensure methodological robustness and wide acceptance [18]. Standards provide a guiding framework for FP and H2020 research projects, ensuring: tests and analytical work are carried out according to established norms, and developed technologies are interoperable with existing technologies and compliant with industry standards. By working to existing standards research projects have a higher chance of their outputs being accepted by scientific and industrial communities. Working with existing standards also enables researchers to recommend and contribute to new standards development, thus increasing their technical knowledge, widening their business networks and strengthening the market exploitation of their results [6]. to identify gaps in coverage or as the basis for revising resp. extending them. There are many thousands of standards of various types. Standards can be categorized into four major types • Fundamental standards - concerning terminology, conventions, signs, units etc • Test methods and analysis standards - which measure characteristics such as temperature, size, force and chemical composition • Specification standards - which define a product’s characteristics (product standards), or a service (service activities standards) and their performance thresholds such as fitness for use, interfaces and interoperability, health and safety, environmental protection, etc • Organisation standards - which describe the functions and relationships of a company, as well as elements such as quality management and assurance, maintenance, value analysis, logistics, project or system management, production management, etc SolarDesign has decided to integrate standardization into the project’s efforts as standards can enhance the economic value of research and innovation projects [3]. An early step was the decision which route answered SolarDesign’s needs best. There are several ways in which standards and standardization can be integrated • Integration of standardization bodies into the project’s consortium • Identification of a national, European or international Standards Body which can be an associate in the project, for example in a Steering Group • Informal participation of CEN-CENELEC Management Centre as associate in a Steering Group. This is possible in projects with specific work packages on standardization • Identification of the links SolarDesign has with ongoing standardization (e.g. "Photovoltaics in buildings" prEN 50583) • requesting a Project Liaison with an existing Technical Committee. As soon as this status has been granted, the project can participate in the Technical Committee's plenary meetings and contribute to the working groups. • The project requesting the Project Liaison can demonstrate formal collaboration with the European Standardization System (usually a requirement in FP7 calls); • The project representative can participate in the TC directly thus ensuring synergies between the research and standardization work (avoiding duplication of standardization work); • The project representative can propose a new work item (standard) directly to the TC without Figure 2. Benefits of Research of Using Standards [5] SolarDesign as many FP7 and H2020 projects started with an analysis and review of existing standards, either MIPRO 2016/MEET 159 going through a national delegation (direct impact on the standardization work program. International standardization is a highly social activity involving specialists, detailled observation of the written standardization process, and a clear strategic agenda. Often it is a highly organizational activity, and in many cases it is a highly individual-dependent activity. Consensus building is an important process which demands the following skills [14]: • sharing goals, costs, risks, quality requirements, measures, and alternatives • sharing awareness of each party’s positions, expectations and backgrounds • skills in communications, listening, persuasion and facilitation • ability to foster compromises It is impacted by the following issues: • ongoing explicit or hidden business models • human networks and trusts among participants • issues due to external organizational aspects • technology trends In the frame of SolarDesign the coordinating institution TU Wien motivates the partners to proactively contribute to the standards development processes. Standardization within Task 5.3 aims at providing a bridge connecting research to industry by promoting innovation and commercialization through dissemination of new ideas and best practice. This comprises circulating of new measurement and evaluation methods, implementation of new processes and procedures created by bringing together all interested parties such as manufacturers, researchers, designers and regulators concerning products, raw materials, processes or services. SolarDesign’s consortium puts strong efforts on including sessions on standardization in the frame of meetings for increasing the internal and external awareness of the topic (e.g. a standardization workshop was held at midterm). Integrating photovoltaics in architecture and industrial design is in its infancy and taking into account different European perspectives is crucial for its success. This is especially true for Building Integrated PV where legal conditions like construction codes or feed-in regimes are differing from one European country to another. By seeing design from a stylish or sensory perspective regional characteristics exist. Taking product designers and PV experts from different European countries can ensure the incorporation of varying design manifestations. Dissemination, implementation and standardization cannot be seen as separate activities. They must be fully coordinated and represent two sides of the same plan for success. Networking and documentary standards are promising and vital tools of disseminating SolarDesign’s research to the market place. 160 All documents of standardization relevance are jointly prepared by the partners and ongoingly are bundled and released by the task leader TU Wien. IV. SOLARDESIGN’S STREETLAMP - AN EXAMPLE HOW INNOVATION AND STANDARDIZATION INFLUENCE EACH OTHER In an early stage of the project the SolarDesign consortium identified where standards can benefit the project. In the next step the participants defined standardization issues. A typical problem of PV systems is the power loss due to temperature increase, because modules often operate close to the product envelope with low ventilation. SolarDesign’s partner institution EURAC therefore evaluated and compared the PV temperature conditions of different PV module categories (in terms of PV technology and material type). A simple linear expression for the evaluation of the PV module temperature is Tmod=Tamb+kG which links Tmod with the ambient temperature Tamb and the incident solar radiation flux G. Within this expression the value of the dimensional parameter k, known as the Ross coefficient, depends on several aspects (i.e. module type, wind velocity and integration characteristics). However, dispersed values for this parameter can be found in literature (in the range of 0.02-0.06 K m2/W) according to different module types [7]. Figure 3. Overview of temperature coefficients of different thin film photovoltaic technologies [9] As can be seen in Figure 3, the efficiency decrease of ZnS-buffered CIGS (Copper, Indium, Gallium, Diselenide) is smaller than for CdS-buffered CIGS. With further improvement of alternative buffers (like ZnS, ZnMgO, Zn(O,S)), the efficiency values of CdS-buffered CIGS at 25°C can be reached (demonstrated by Helmholtz Zentrum Berlin [8]. Due to lower temperature coefficient even higher efficiencies at operating temperatures (typically ~60°C for midsummer day) can be expected for alternative buffers – without using toxic Cd. For deposition of different buffer layer materials both sputtering and wet chemical process are available. Manufacturers typically rate PV modules at standard test conditions. However, the actual energy production of field installed PV modules is a result of a range of operating temperatures, irradiances, and sunlight spectra. Therefore, there is an urgent need to characterize PV modules at different temperatures and irradiances to provide comprehensive rating information [15]. One of the most relevant PV standards issued by the MIPRO 2016/MEET International Electrotechnical Commission (IEC) Technical Committee 82 Working Group 2 (IEC/TC82/ WG2) is the IEC 61853 standard titled Photovoltaic Module Performance Testing and Energy Rating (IEC, 2011). This can be seen as relevant for the CIGS PV modules. Why is their high efficiency of relevance in this specific case of autonomous streetlamp? The technical parameters of streetlamps are defined in four standards which guarantee a common base of light conditions in European cities. PD CEN/TR 13201-1:2014 Road lighting. Guidelines on selection of lighting classes Road lighting, Lighting systems, Roads, Lighting levels, Classification systems, Road safety, Traffic flow BS EN 13201-2:2015 Road lighting. Performance requirements Road lighting, Lighting systems, Roads, Performance, Classification systems, Luminance, Glare, Lighting levels, Road safety, Environmental engineering, Luminaires, Pedestrian-crossing lights BS EN 13201-3:2015 Road lighting. Calculation of performance Road lighting, Lighting systems, Roads, Performance, Mathematical calculations, Photometry (light measurement), Light distribution, Lighting levels, Luminance, Road safety, Luminaires BS EN 13201-4:2015 Road lighting. Methods of measuring lighting performance Road lighting, Lighting systems, Roads, Performance, Photometry (light measurement), Performance testing, Test equipment, Luminance, Lighting levels, Luminaires, Reports. increases when a passenger approaches luminescence no passenger -x distance +x Standardization - proposal 1 Road Lighting CEN/TR 13201-1,2,3,4 To safe energy the luminescence is adjusted automatically to actual distance between passenger and lamp – required: calculation and measurement of photometric performance for this case V. International standardization from the viewpoint of social structure shows several characteristics. First, the level of knowledge required to be a leader is significantly high. This leads to the fact that acting as the main editor of a technical standardization requires experience and extensive work, which is difficult to carry out as a side job. Standardization on European level turns out to be a long process [16], different from national standards which often can be finalized and issued within a few months. At the international level, the full process takes two up to five years. There are six main stages in publishing a new European resp. international standard. Proposal stage: originates from a National Committee. Represented countries vote on the interest and nominate experts which define a work programme with target dates. Preparatory stage: a Working Draft (WD) is prepared by a project team. Committee stage: a Committee Draft (CD) is submitted to the National Committees for comment. Enquiry stage: a Committee Draft for Vote (CDV) is submitted to all National Committees: a majority of two thirds is requested. Approval stage: a Final Draft International Standard (FDIS) is prepared. Publication stage: if the FDIS is approved by a two thirds majority, the document is published by the International Electrotecnical Commission (IEC) Central Office as an international standard. To avoid duplication of efforts, speed up standards preparation and ensure the best use of the resources available and particularly of experts' time IEC and CEN/CENELEC [5] agreed: if the results of parallel voting are positive in both the IEC and CEN/CENELEC, the IEC will publish the International Standard, while the CEN/CENELEC Technical Board will ratify the European Standard [17]. In technology-based markets, to absorb external knowledge and conversely to stake their claims, stakeholders participate in the global socio-technical network of standardization [12]. Main target stakeholders of SolarDesign to be addressed are: • Standardization bodies: International standardization organisations such as: (CEN/CENELEC, IEEE, etc.) and national such as: HZN, AFNOR, DIN, SNV, SIST. They develop and establish product and/or process standards to be followed by producers and application developers • Certification entities: (EEPCA - Professional Association of the European Certification Bodies, EECC - The European Certification Council, etc.) provide confidence to users that a certain element of the SolarDesign project is produced and/or operated according to a defined set of practices or standards • Photovoltaic systems suppliers, installers and service providers for energy efficient buildings (Building integration and architecture) and Solarpowered consumer products (e.g. solar lighting, PV driven street lamp with proximity sensor Figure 4. Energy saving performance of a PV driven street lamp as example of need for standards development in the frame of the FP7 NMP project SolarDesign [2] To meet all these requirements research undertaken in the frame of FP7 NMP project SolarDesign had to concentrate on an appropriate level of CIGS efficiency to guarantee the luminance defined in the respective standard. To safe electric energy which can be harvested only during day, the luminescence of the street lamp is modulated according to the distance of approaching passengers (by an integrated proximity sensor). This special performance characterized by changing lighting levels makes necessary a new standard covering related calculation procedures too. MIPRO 2016/MEET CHARACTERISTICS OF STANDARDIZATION PROCESS 161 textile integration) through national, European and international associations related with the technologies developed in the project • • Training providers: provide training to qualify developers and other professionals to work with the SolarDesign project results. Such as: Universities, RTD and devoted training organisations Local Authorities & National/Regional Public Bodies are key players as policy makers, favourable legislative framework creation, public procurement • Architects’ associations. Architects need to be provided with appropriate training, tools and guidelines for them to consider the integration of photovoltaic materials • Solar-powered consumer users. To be provided with appropriate training, tools and guidelines for them to consider the integration of photovoltaic materials in their products • Construction companies associations and related research associations should be aware of the new technologies that will be installed • Public and private real estate Promoters. They can offer to their clients the advantage of the developed system therefore they should be informed about it and about new business models • Clients and users (citizens): key actors interested in cooperative working systems or applications providing their perspectives in the formulation and assessment of the project results • Network operators: may act as a channel for offering and billing services and/or access devices to users. • Energy management agencies: regional and/or national energy agencies are promoting efficient and innovative energy technologies A basic requirement to minimalize the duration of the whole consensus process is a consistently written specification, which requires a skilled and dedicated editor resp. team of editors. This often is characterized by the fact that there are some hidden structures in the consensus process. Editor Contributors Driver Registers Watcher Facilitator Process Observer Lobbyist Technology Observer Coordinates all inputs and updates, has to be well prepared to provide editorship and can replace other editors if this should become necessary Experts who create new inputs and support discussions, they do not overtake responsibility as editors The workforce that drives and accelerates the standard with collaborative writing. Try to delays or obstructs the progress of a standard. The people that watch the trends. Facilitate consensus process Watch the consensus process and express opinions on process violation, but they avoid responsibility for consensus finding People that negotiate under the table for their own business models. Watch the consensus process for technical quality Barriers and problems encountered when contributing to new or revised standards can be summarized as follows [5]: first, timetabling issues, wherein the timeframes for the research project and the standards development work are not aligned sufficiently and coordination between the two processes becomes difficult because of the differing stages of progress. Difficulties in gaining acceptance for the inputs put forward by the project team, noting competing research resp. industrial interests, lobby groups, or results too ‘innovative’ to be accepted by the standardization groups. Another problem often is based on resource availability over a longer time, given the shortterm nature of project funding, often resulting in a funding gap. In cases of limited experience in standardization additional difficulties might rise from unclear access to standardization bodies and their technical committees, due to membership rules, lack of direct participation, etc. Researchers often see standards development as a time consuming and difficult process which limits the available time for research and teaching. Project teams might find it difficult to identify suitably qualified experts able and willing to work within the standardization process to implement the project results. Anyway the consortium has to be aware of a learning curve associated with understanding the world of standardization (e.g. how to identify already existing standards, how to propose and make changes to standards, how to gain acceptance, etc.). The consensus process never will be an equalitybased decision-finding and communication process. Some stakeholders contribute significant efforts for editing the specification in order to drive some favoured business goals [14]. But without such contributions, it is difficult to carry out today’s international standardization. This situation results in a clear diversification of roles and responsibilities in the complex standardization process as shown in Table I. TABLE I. Role 162 ROLES OF PLAYERS IN INTERNATIONAL STANDARDIZATION [14] Figure 5. Barriers of Links between Research and Standardization [5] Description MIPRO 2016/MEET VI. CONSORTIUM The SolarDesign consortium is constituted by 11 participants (6 SMEs and 5 research institutions) from 6 countries who gather all the necessary background and expertise to achieve the ambitious research and standardization objectives of the project. Consortium: Technische Universität Wien (Austria), Sunplugged GmbH (Austria), Faktor 3 ApS (Denmark), Innovatec Sensorisatión y Comunicación S.L (Spain), Studio Itinerante Arquitectura S.L. (Spain), RHP Technology GmbH (Austria), Asociación de Industrias de las Technologias Electrónicas y de la Información del País Vasco (Spain), Munich University of Applied Sciences (Germany), Accademia Europea Bolzano (Italy), Università degli Studi di Milano-Bicocca (Italy), Commissariat à l’energie atomique et aux energies alternatives (France). VII. CONCLUSION Integrating photovoltaics into industrial design is in its infancy and taking into account different European perspectives is crucial for its success. Seeing design from a stylish or sensory perspective regional characteristics exist. This is especially true where legal conditions are differing from one European country to another. Dissemination, implementation and standardization cannot be seen as separate activities. They must be fully coordinated and represent two sides of the same plan for success. SolarDesign from the beginning decided to contribute to the role that standardization plays, such as: • Ensuring broad applicability of SolarDesigns’s outcome • Fostering an increase of the efficiency of research and development work • Supporting that interoperability of developed solutions is given with already existing technologies or regulations. • Contributing to standards development where a need was identified [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] ACKNOWLEDGMENT The FP7 NMP SolarDesign Project has received funding from the European Union's Seventh Framework Programme under grant agreement n° 310220. REFERENCES [1] [2] B. Parida, S. Iniyan and R. Goic, “A review of solar photovoltaic technologies,” Renewable and Sustainable Energy Reviews 15 (2011), pp. 1625–1636. SolarDesign Newsletter 4, January 2015, http://www.solardesign.eu/ . MIPRO 2016/MEET [16] [17] [18] CEN (European Committee for Standardization)/CENELEC (European Committee for Electrotechnical Standardization), “Why Standards,“ http://www.cencenelec.eu/research/WhyStandards/ Pages/default.aspx. S. Ishizuka, T. Yoshiyama, K. Mizukoshi, A. Yamada and S. Niki, “Monolithically integrated flexible Cu(In,Ga)Se2 solar cell submodules,” Solar Energy Materials and Solar Cells 94 (2010), pp. 2052-2056. CEN (European Committee for Standardization)/CENELEC (European Committee for Electrotechnical Standardization), “Research Study on the Benefits of Linking Innovation and Standardization,” Dec 2014, Ref: J2572/CEN. J. Stroyan and N. Brown, “Study on the contribution of standardization to innovation in European-funded research projects,” technopolis group, Sept 2013 L. Maturi, G. Belluardo, D. Moser and M. Dell Buono, “BiPV system performance and efficiency drops: overview on PV module temperature conditions of different module types,” Energy Procedia 48 (2014), pp. 1311-1319. H. Papathanasiou, “ROHS compatible buffer layers for CIGS based solar modules – first results from the Photovoltaic Alliance Project,” 3rd International Workshop on CIGS Solar Cell Technology, Berlin, 17.-19.04.2012. S. Y. Sohn and Y. Kim, “Economic evaluation model for international standardization of correlated technologies,” IEEE Transactions on Engineering Management, vol. 58, no. 2, May 2011, pp 189-198. DIN (Deutsches Institut fuer Normung e.V.), (2000), “The economic benefits of standardization,” Research Rep., http://www2.din.de/ . Standards Australia. (2006), “Standards and the economy,” Research Report, http://www.standards.org.au/ . E. N. Filipovic, “How to support a standard on a multi-level playing field of standardization: propositions, strategies and contributions,” Proceedings of the 2014 ITU Kaleidoscope Academic Conference: Living in a converged world - Impossible without standards?, 2014, pp: 207-214, DOI: 10.1109/Kaleidoscope.2014.6858464 referenced in: IEEE Conference Publications. S. Wurster, K. Blind and S. Fischer, “Born global standard establishers identification of a new research field and contribution to network theory,” Proceedings of the 8th IEEE Conference on Standardisation and Innovation in Information Technology (SIIT), Sophia-Antipolis, 09/ 2013, doi 10.1109/SIIT.2013.6774583, pp. 1-12 T. Yamakami, “A three-dimensional view model of international standardization using techno-sociological analysis,” 4th International Conference on Interaction Sciences (ICIS), 2011, IEEE Conference Publications, pp. 13-18. G. Tamizh Mani,. K. Paghasian, J. Kuitche, M. Gupta Vemula and G. Sivasubramanian, “Photovoltaic module power rating per IEC 61853-1 standard: a study under natural sunlight,” Arizona State University Photovoltaic Reliability Laboratory (PRL), March 2011, www.solarabcs.org/ratingper61853 . D. Blanquet, P. Boulanger, A. Guerin de Montgareuil, P. Jourde, Ph. Malbranche and F. Mattera, “Advances needed in standardisation of PV components and systems,” Proceedings of 3rd World Conference on Photovoltaic Energy Conversion May 1118, 2003 Osuka, Japan, pp. 1877-1881. IEC International Electrotecnical Commission, “Inside the IEC – Information,” www.iec.ch. CEN (European Committee for Standardization)/CENELEC (European Committee for Electrotechnical Standardization), “The importance of standards,” http://www.cencenelec.eu/research/ tools/ImportanceENs/Pages/default.aspx . 163 Open public design methodology and design process D. Rembold*, S. Jovalekic** * ** IES, Albstadt, Germany, rembold@hs-albstadt.de IES, Albstadt, Germany, jovalekic@hs-albstadt.de Abstract - This paper proposes a design methodology and design process for mechatronic systems incorporating the public community. As a proof of concept, we are building up robot system consisting of the following components: software code files, hardware design files and controller hardware. Every component will be available for review on an open repository and review tool which is GIT/Gerrit. Designers release components into the repository in form of commits. Commits can be downloaded by every community member, examined and reviewed in detail by the public community. At certain time spots we pick a number of commits with positive review results, and review the commits on an overall system perspective. If this review passes, we create a real system from the commits. The steps are: software code compilation, hardware components creation with a 3D printer, controller hardware ordering. Then we build the new robot upon these components. If we can build successfully the robot system (which is expected due to the overall review) and if we are passing all testcases, a new release is created from the picked commits. Every community member can download the latest release for their own usage. The business case for this design methodology and design process, is that we offer to the customer a service to conduct tests from a set of reviewed commits. We provide the resources and the knowledge to this service. We inform the customer about the review and test results and the customer can precede improving the system based on the feedback we provide. I. INTRODUCTION Recently a new maker community has developed for machine development. The designs is often free to download, and vice versa everybody can upload his machine design on dedicated websites. The public community is able to put feedback in form of remarks next to the upload. Unfortunately this is not really following a real process in a way we know this from companies. The originator of the upload is free to take a look into the remarks or comments, but there is no requirement to pursue the feedback. Every component will be available for review on a public repository and an open review tool such as GIT/Gerrit, see Figure 1. Figure 1. Design Parts in Repository The public community can review every single component and add its approval or rejection to it, see Figure 2. In detail this means, all components to be reviewed are available in the repository in form of commits, see Figure 3. Commits can be downloaded by every community member, examined and reviewed in detail. At certain time spots we pick a number of commits with positive review results, create a system from them, test it and eventually release the commits if successful, see Figure 4. Often the uploaded designs lack of sufficient description, e.g. we see quite often that a build of material (BOM) is missing. So we think, that more effort must be spent into improving the build description. Our goal of this project is to create a process to build machines (in our case a robot system) consisting of components such as software code, design files, controller hardware and build instructions in form of videos and boms. 164 Figure 2. Review MIPRO 2016/MEET During creation of the system, software components are compiled, design files are created by a 3D printer, and controller hardware are ordered automatically. Then we build the new robot upon these components. If we are able to build successfully the robot system and if the robot is passing all testcases, a new release is created from the picked commits. Then the picked commits are moved into the master repository of GIT for the public download. II. COMMIT DEPENDENCIES There is a dependency between design files and software code. Updated design files for the robot can cause an update to new robot geometry layout, therefore the robot's inverse transformation algorithm needs to be adapted inside the software code. So in a case there is a change in robot geometry, but the person who created the commit did not change the inverse transformation, then we do not pick the design file change, because this is a violation of the process rule. The person who commits must indicate the dependency between the design file commit and the software code commit. Another dependency lies in the choice of the motor driver, motor and the robot layout. The motor driver and motor need to be able to handle the inertia of the robot which is given by the robot layout. In case there is a new commit with a new motor driver, but no commit with a robot layout change, then there must be at minimum a justification that a new robot layout is not needed. Also a new motor driver, might have consequences on the software which drivers the motor driver. There are dependencies on commits such as software code and testcases. E.g. if a motor driver is updated, we expect a new release of a testcase which reflects the testing of new attributes of the motor driver. Otherwise it would make little sense to introduce a new motor driver, if there are no testcases to verify them. Figure 3. Commits The business case for this design methodology and design process is to offer a service to the customer to conduct tests from a set of reviewed commits. We provide the resources and the knowledge to this service. After executing the above described process we inform the customer about the test results and the customer can precede improving the system based on the feedback we provide. We offer simulation code for simulation tests of newly committed software code. However new design files can cause a change in robot geometry layout, so the simulation program must be adapted here as well. We are working on providing a configuration file of the robot's geometry. In future we will not pick any design file commits, if there was no adaption of the robot's geometry configuration file. III. AUTOMATIC TESTING As continuous integration tool we use Jenkins to gather the commits and to process the components in an automatic fashion. There are several steps Jenkins is working on the commits. The steps depend on the content of the commits: The first step is that Jenkins extracts reviewed commits containing software driver changes and it compiles a new driver level. Jenkins then extracts the testcases and the robot geometry layout and updates the simulation program. The driver is then tested against the simulation. In case the commits contain new robot components in form of design files, Jenkins loads the design files automatically into 3D printer and it creates them one by one. Jenkins passes the content of the commits with purchasable parts such as new robot controller parts to an ordering software and orders automatically the new robot controller parts. Figure 4. New release from commits box. Identify applicable sponsor/s here. If no sponsors, delete this text MIPRO 2016/MEET We cannot do the assembly automatically due to its complexity, so this is done manually. The tester of the robot system assembles the robot with the newly created components (design files and robot controller parts), loads 165 the compiled software to the controllers and tests the system. IV. 2. We pull simulation software from the repository, if e.g. updated design files require this. This is the case when the design files cause a new robot geometry layout. A new simulation system is built from the new design files. 3. We run and test the driver against the simulation with testcases located in the repository. In some cases the testcases need to be updated, due to new robot layout, or new robot controller parts. 4. In case simulation fails, we will create a list of errors and a list of suggestions for improvement (e.g. with a ticketing system) and hand them back to the customer. BUSINESS IDEA We are working toward a goal to use this process to create a business opportunity to offer a service to customers. There might be customers who are interested in one of our designs, but in general the designs don’t fulfill completely their requirements. The customer can download the design and create their own system from it, but he has often a good understanding of what needs to be done differently. Often the customer does not have the knowledge to build a new system, or he is not willing to spend the effort himself. So he can ask us for a service to conduct tests from a set of reviewed commits. The commits can be existing commits, commits provided by the customer himself or requested design changes we released for the customer. We provide the resources to do the required design changes and to conduct tests. Every design change will be uploaded to git in form of a commit. We retrieve the requested commits and apply them to a process in three phases: If all tests pass in the first phase, we can move to the second phase: B. Second Phase The purpose of the previous phase was to test the software code and the hardware in simulation only. The second phase is concerned about creating the real hardware of the robot system, see Figure 6. Figure 6. Second Phase Figure 5. First Phase A. First Phase In the first Phase we pick components and solely test them on simulation, Figure 5. depicts this process. 1. 166 The customer selects together with our consultation the committed software code from our repository, and we pull the software code and create a driver from it. 1. In case the customer requires new controller hardware described in the commits, and the commits passed the reviews, then Jenkins will pull the content of the commits from the repository and orders automatically the hardware components using a consumer goods ordering system. 2. If there are new design files, Jenkins pulls the commits with design files from the repository and MIPRO 2016/MEET loads them into the 3D printer and it prints the updated parts. 3. From all newly created parts, we build manually the new robot system. 4. We run testcases against the real hardware. These are the same testcases, which we ran previously in Phase 1. against the simulation. 5. If the test against real hardware fails, we create a list of errors and a list of suggestions for improvement and hand them back to the customer. If the second phase passes all tests, then we are moving to the third phase: C. Third Phase The second phase was about testing the system in a real hardware environment. If all testcases pass the second phase, then we precede with the repository git itself, see Figure 7. V. SYSTEM COMPONENTS In the following you find a summary of components which will be released into the git repository and a list of software components to set up the process. All components can be openly pulled. Also everybody can review all components and add comments. The comments are shown publicly. A. Control Software The control software is code which drives motors and contains control interfaces to move the robot head from one point to another. There are currently two sorts of movements possible, which is linear movement and point to point movement. To translate the movements to motor signals, we need to transform Cartesian coordinates to motor positions. The software component “inverse transformation” is dealing with this task. We purposely use UML to generate automatically code from the UML diagrams. The advantage of UML is that graphical representation of code is easier to review than pure C++ code. There are certainly drawbacks with UML, such as fixes in pure C++ code must be phased back to UML which is often quite cumbersome. There are two sorts of diagrams we store into the git repository: UML Class Diagrams and UML Object Diagrams. Currently we have a linear programming approach for our control software. This means that each task is executed in sequence. We will enhance this to a real time operating system containing scheduler, dispatcher etc. So we will be able to execute tasks concurrently and we will be able to set priorities to tasks. E.g. a task with low priority can be the driver for the display. Below you find a list of software components stored in the git repository. Figure 7. Third Phase • UML Class Diagrams • UML Object Diagrams • inverse transformation algorithm • point to point movement algorithm • linear movement algorithm • control program • future topic: scheduler • future topic: preemptive multitasking code • testcases B. Simulation Software 1. We gather all commits contributed in phase 1. and phase 2., and create a new release level from the them. 2. All commits are merged into the master branch of the repository. 3. The system is demonstrated to the customer. Customer accepts the system if he is satisfied and we will bill him. MIPRO 2016/MEET The reason to use simulation software before testing on real hardware is to save the effort to build up the real hardware, just for testing the software. Revealing software problems might lead to the decision to stop building up the hardware, which saves time and effort. Another reason is to continuously testing the software released inside the repository. Single commits can be pulled in an automatic fashion and drivers can be built from them. Tools such as Jenkins are suited for this kind of tasks. 167 The simulation system we build up with will be a Mathematica model. It offers the feature to generate the source code from it. Therefor we release Mathematica models into the git repository. The model is currently pure static, meaning that the motor dynamics and the robot systems inertia is not taken into account. In future we will implement a dynamic model as well. We will configure the static model of the robot in a configuration file. This means that if we have a design change of the robot, we often do not have to touch the Mathematica model, but just need to update the configuration file. In future we will also combine the stl design elements with the simulation, so the robot system can be tested visually. The following simulation components are released into the repository. • mathematic models (Mathematica export files, source code) • pure static model • dynamic model (future topic) • configuration file for the robot's geometry • future topic: graphical simulation elements C. Robot Controller We have set to our self the requirement that we don’t design any component of the robot controller, such as the ARM32 board and the step motor driver board, besides the interface connector between the components. We put weight to the fact that these components must be purchasable. Here comes the consumer goods ordering system into effect. Jenkins triggers the ordering process with the ordering data inside the commit. The following descriptions for automatic ordering are stored into the repository • components (design files) • 3D printer parts • carriages • step motors E. Software We mostly use public available software to accomplish the system we are proposing. First the git repository, which is more and more becoming a popular repository for code and other design files. We use Gerrit as a review tool, so the community can check about newly available released components for review. Every community member can download the newly components and check them for their own purpose. Gerrit offers a mechanism to evaluate the new component by pressing radio buttons with attributes such as: ok to release or prefer not to release. At the end, the administrator has the right to overrule the comments of the public community if required, but in general, the comments are taken very seriously. Jenkins is a continuous integration tool using a plugin from Gerrit. E.g. if there is a control software release, Jenkins can download automatically the files, compile them and test them against a simulation. Then even Jenkins can set automatically an evaluation about the released code, depending if the testcases which have been running against the simulation were successful or not. The following software tools are currently used to accomplish the proposed system: • power supply • git repository • Controller hardware: ARM32 board, step motor controller and the ordering data • Gerrit review tool • Jenkins, a continuous integration tool • consumer goods ordering system Other files going into the repository: • Interface connector for ARM32 board and step controller board. • design description • complete layout plan with pure purchasable standard components and their interface connectors. D. Robot The component files (design files) of the robot are released as Sketchup files, files in stl format and files in gcode format. Downloading the gcode format files makes it us easy to automatically move the files to the 3D Printer and print them out. This can be partially triggered by Jenkins again. There are a few parts from the robot, which cannot (or should not) be printed, such as bearings, screws, carriages, step motors etc. Here comes the 168 consumer good ordering system into play again. So the person who releases purchasable parts, must fill in the commit comments with data, where to buy the part. Jenkins triggers the ordering application, gathers the ordering data from the commit and automatically orders the parts. The following is release in git. VI. OUTCOME AND CONCLUSION In this paper we presented a new design methodology for machinery which includes the public community in reviewing the machinery parts consisting of software code, controllers, machinery parts and other. We are still in an early stage. GIT, Jenkins etc. are set up, but the team is struggling to get used to the work process. Also the automatic features are still missing, such as downloading code to flash the memory. In our case we studied mainly the design a delta robot, but we believe that the concept can be extended to any kind of machinery. In Figure 8 you can see an improved version of the robot “Cherry Pi II” released in thingiverse which we have redesigned trying the proposed concept. This is the first robot which is undergoing the proposed MIPRO 2016/MEET process. The redesign involved the removal of mechanical play of the carriages. Another delta robot is currently created, see Figure 9. There you can see a redesign of the carriage using only low cost parts. Our study does currently not involve the open public, but only a selected number of persons. within the Program Fit4Research. Without the funding we would have no opportunity to start the project described in this paper. Also we would like to thank Florian Wiest to show so many new research idea within this area. Also we thank our students Sinem Guruplar and Goncagül Albayrak working on their theses providing the outcomes presented in this paper. REFERENCES [1] [2] [3] [4] [5] Figure 8. Cherry Pi II robot [6] In future we will involve the public community too. The incentive of the public community for reviewing the components is that they can download all parts of the machinery for free to build devices for their own purpose. Therefor the open public does have an interest in continuously improving the design located in the public repository. [7] [8] [9] [10] [11] [12] [13] [14] Figure 9. Redesign of the carriage The administration of the repository needs funding, so we came up with a business idea, that we offer a purchasable service to our partners to test out the machineries according to their demands. The customer can take a look at the machines we offer on the website and convince them, that we have the competence to build such machinery. The service include consulting, simulating and testing. ACKNOWLEDGMENT We would like to thank the University of Applied Science Albstadt-Sigmaringen for sponsoring this project MIPRO 2016/MEET [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] Oracle White Paper, The Department of Defense (DoD) and Open Source Software, 11/2013 G. Madey, V. Freeh and Renee Tynan, The Open Source Software Development Phenomen: An analysis based on social network theory. Open Sources: Voices from the Open Source Revolution, 156592-582-3 J. Asundi, Software Engineering Lessons from Open Source Projects, Position Pater for the 1st Workshop on Open Source Software, ICSE-2001 R. Vallance, S. Kiani, and S. Nayfeh, Open Design of Manufacturing Equipment, MIT Sören Sonnenburg et. al., The Need for Open Source Software in Machine Learning, Journal of Machine Learning Research, 2007. S. Maurer, S. Scothmer Open Source Software: The new intellectual property paradigm, National Bureau of Economic Research (NBER) working paper, 2006 P. Rigby et. al., Peer Review on Open-Source Software Projects: Parameters, Statistical Models, and Theory B. Lampson, R. Sproull, An Open Operating System for a SingleUser Machine NEMA. 1996, Motion/Position Control Motors, Controls, and Feedback Devices, NEMA Standard MG-7 Rev. 1. National Electrical Manufacturers Association (NEMA), Rosslyn, VA. September 1996. Free Software Foundation, GNU General Public License, http://www.gnu.org/licenses/gpl-3.0.en.html Open Design Foundation. "Open Design Definition", http://www.opendesign.org/odd.html Maker culture, https://en.wikipedia.org/wiki/Maker_culture humanity's first general-purpose self-replicating manufacturing machine, http://reprap.org/ Thingiverse, http://www.thingiverse.com/ Openbuild, http://openbuilds.org/ www.amazon.com/oc/dash-button Pinshape, https://pinshape.com/ 3D Shook, www.3dshook.com/ Trinpy, https://www.trinpy.com CGTrader, https://www.cgtrader.com/ 3D-File Market, http://3dfilemarket.com/ Cults, https://cults3d.com/ Makershop, https://www.makershop.co/ MyMiniFactory, https://www.myminifactory.com/de/ Redpah, https://www.redpah.com/ Shapeking, http://www.shapeking.com/de/ Yeggi, http://www.yeggi.com Youmagine, https://www.youmagine.com/ Threeding, https://www.threeding.com/index.php TurboSquid, http://www.turbosquid.com/ STL Finder, http://www.stlfinder.com/ Florian Wiest, Die Zukunftsmacher, Schwäbisches Tagblatt, 6.3.20 169 DC VIS International Conference on DISTRIBUTED COMPUTING, VISUALIZATION SYSTEMS AND BIOMEDICAL ENGINEERING Steering Committee Chairs: Karolj Skala, Ruđer Bošković Institute, Zagreb, Croatia Roman Trobec, Jožef Stefan Institute, Ljubljana, Slovenia Uroš Stanič, Biomedical Research Institute-BRIS, Ljubljana, Slovenia Members: Enis Afgan, Ruđer Bošković Institute, Zagreb, Croatia Almir Badnjević, International Burch University, Sarajevo, Bosnia and Herzegovina Piotr Bala, Nicolaus Copernicus University, Toruń, Poland Leo Budin, University of Zagreb, Croatia Borut Geršak, University Medical Center, Ljubljana, Slovenia Simeon Grazio, “Sisters of Mercy” University Hospital Centre, Zagreb, Croatia Gordan Gulan, University of Rijeka, Croatia Yike Guo, Imperial College London, UK Ladislav Hluchý, Slovak Academy of Sciences, Bratislava, Slovakia Željko Jeričević, University of Rijeka, Croatia Peter Kacsuk, Hungarian Academy of Science, Budapest, Hungary Aneta Karaivanova, Bulgarian Academy of Sciences, Sofia, Bulgaria Zalika Klemenc-Ketiš, University of Maribor, Slovenia Charles Loomis, Laboratory of the Linear Accelerator, Orsay, France Luděk Matyska, Masaryk University, Brno, Czech Republic Željka Mihajlović, University of Zagreb, Croatia Damjan Miklavčič, University of Ljubljana, Slovenia Tonka Poplas Susič, Health Centre Ljubljana, Slovenia Laszlo Szirmay-Kalos, Technical University of Budapest, Hungary Tibor Vámos, Hungarian Academy of Sciences, Budapest, Hungary Matjaž Veselko, University Medical Center, Ljubljana, Slovenia Yingwei Wang, Universitiy of Prince Edward Island, Charlottetown, Canada INVITED PAPER Views on the Role and Importance of Dew Computing in the Service and Control Technology Zorislav Šojat, Karolj Skala Ruđer Bošković Institute, Centre for informatics and Computing, Zagreb, Croatia sojat@irb.hr, skala@irb.hr Abstract - Modern day computing paradigms foster for a huge community of involved participants from almost the entire spectrum of human endeavour. For computing and data processing there are individual Computers, their Clusters, Grids, and, finally, the Clouds. For pure data communication there is the Internet, and for the Humanunderstandable Information Communication for example the World Wide Web. The rapid development of hand-held mobile devices with high computational capabilities and Internet connectivity enabled certain parts of Clouds to be "lowered" into the so called "thin clients". This led to development of the Fog-Computing Paradigm as well as development of the Internet of Things (IoT) and Internet of Everything (IoE) concepts. However, the most significant amount of information processing all around us is done on the lowest possible computing level, outright connected to the physical environment and mostly directly controlling our human immediate surroundings. These "invisible" information processing devices we find in our car's motor, in the refrigerator, the gas boiler, air-conditioners, wending machines, musical instruments, radio-receivers, home entertainment systems, traffic-controls, theatres, lights, wood-burning stoves, and ubiquitously all over the industry and in industrial products. These devices, which are neither at the cloud/fog edge, nor even at the mobile edge, but rather at the physical edge of computing are the basis of the Dew Computing Paradigm. The merits of seamlessly integrating those "dew" devices into the Cloud - Fog - Dew Computing hierarchy are enormous, for individuals, the public and industrial sectors, the scientific community and the commercial sector, by bettering the physical and communicational, as well as the intellectual, immediate human environment. In the possibility of developing integrated home management/entertainment/maintenance systems, selforganising traffic-control systems, intelligent driver suggestion systems, coordinated building/car/traffic pollution control systems, real-time hospital systems with all patient and equipment status and control collaborating with the medical staff, fully consistent synaesthetic artistic performances including artists and independent individuals ("active public") from wide apart, power distribution peek filtering, self-reorganisation and mutual cooperation systems based on informed behaviour of individual power consumption elements, emergency systems which cooperate with the town traffic, etc., etc., the Dew-Computing paradigm shows the way towards the Distributed Information Services Environment (DISE), and finally MIPRO 2016/DC VIS towards the present civilisation's aim of establishment of a Global Information Processing Environment (GIPE). It is therefore essential, through Research, Innovation and Development, to explore the realm of possibilities of Dew Computing, solve the basic problems of integration of the "dew" level with the higher level Dew-Fog-Cloud hierarchy, with special attention to the necessity of information (not only data) processing and communication, and demonstrate the viability and high effectiveness of the developed architecture in several areas of human endeavour through real life implementations. The present scientific and technological main objective is to provide the concepts, methods and proof-of-concept implementations that are moving Dew Computing from a theoretical/experimental concept to a validated technology. Finally, it will be necessary to define and standardise the basics of the Dew Computing Architecture, Language and Ontology, which is a necessity for the seamless integration of the emerging new Global Information Processing Architecture into the Fog and Cloud Paradigms, as a way towards the above mentioned civilisation goals. I. INTRODUCTION From the very begin of the “computer era” our civilisation tends to become a global information society, and the early developments were already aimed towards computer and data/information interconnectedness. The science fiction of 1960’s/1970’s, as well as many scientists predicting possible future, actually envisioned the present day informatisation. The primary predisposition for that development was achieving substantial speed improvements of processors and their interconnections, and the ability to process increasing amounts of data in memory. Through considerable advances of computer science High-performance distributed computing systems were founded on the Grid computing paradigm, while scalable distributed computing systems evolved through the Cloud, later Fog and now the new computing paradigm called Dew computing. The exponential growth of Internet of Things (IoT), Internet of Everything (IoE) and Big Data processing led to the necessity of horizontal and vertical scalability of distributed resources at multiple levels or layers. To facilitate the rapidly developing complex distributed computer systems and meet the performance, availability, reliability, manageability, and cost adapted for requirements of today’s, and specially tomorrow’s users, a three-layer scalable distributed platform evolved (Cloud - Fog - Dew computing). 175 Dew computing is a new computing paradigm, which appeared after the widespread acceptance of Cloud Computing. The initial research of Dew computing started from the Cloud Computing architecture, and the natural metaphor of a cloud lead to the metaphor of the dew. Dew computing can be seen as a ground, or physical, edge of Cloud Computing. II. THE STORY OF THE DEW The spread of data processing devices into all segments and all layers of modern life in our civilisation has to be, naturally, followed by specific system architectures. By realising that there is a vast amount of processing power and storage spread around the world into quite minute hardware systems, it became obvious that this widespread distribution of resources could be used on a more global scale. Actually this type of system architectures development, the development of architectures which are not computer oriented, but oriented towards reasonably effective exploitation of distributed resources, can be seen to have proceeded in several stages. The first of these stages we can already recognise in the development of the environment of the first large mainframes and supercomputers during the 1960-ies. A typical supercomputer (often as a multiprocessor setup, specially from the second half of the 1960-ies) would actually use quite a few other computers to perform the input/output and telecommunications functions, as well as to prepare data and analyse the results. The cost and speed of the supercomputer was such that it, although generally able to stand as a fully self-standing computer system, would not be used for menial tasks, except during installation and testing, and therefore would actually never be used without it's distributed computers processing environment. The other important distribution was through the data communication channels towards other computing systems. A typical large university, institution or multinational company would have had all their major computing resources always communicational interconnected. Those were the starting points of development of mentioned higher level communication and processing distribution (i.e. not computer-oriented) architectures from which probably the Internet is the most generally known. The Era of Supercomputers lasted till the second half of the 1990-ies. In the meantime the interconnection of computers in the world already lead to the Internet, and a lot of distribution problems were tackled and solved like the distribution of storage, through the Network File System (NFS), presently probably the most used distributed file system; the distribution of tasks and requests, like for example in the Remote Procedure Call protocol (RPC); or the distribution of time, through the community developed Network Time Protocol (NTP); and the distribution of hypertext human-readable information through the World Wide Web (WWW), invented by Tim Berners-Lee in 1989. With the vast spread of computing (i.e. mostly model, data and less information processing), the availability of faster and faster individual hardware processing devices, and the availability of high-quality inter-computer communication links and protocols, and 176 finally, the huge availability of extremely cheap "personal" computers, naturally lead to a lot of investigation into the possibility of using a bunch of the so-called "of the shelf" "personal" computers in parallel, as to enable high-speed parallel processing. The task of organising a bunch of heterogeneous interconnected computers into a stable, usable, fast and easily programmable and maintainable system has shown to be quite complex. Though the interconnection of the cluster elements was solved through the long-time stabilised Ethernet standards and internet protocols, the major problem, not solved to this very day, was and is the already mentioned organisation of a heap of individual computers into a consistent and easily controllable, programmable and fault-tolerant, reconfigurable environment of sometimes hugely heterogeneous individual elements. To get a reliable network of computer clusters on which computational jobs could be executed at the best place when it is most convenient, the Grid Computing paradigm was developed. A lot of scientific areas benefit from the Grid paradigm, as much of their jobs can be Workflows of quite different data sizes and processing complexity either overall or at the level of workflow elements. Huge experimental data sets, the general cycle of scientific exploration and computer use (e.g. experimentation, thinking, writing, programming...), etc. generally do not require the computer results interactively, therefore the Grid architecture enables them much higher processing equipment utilization than it would be possible on any local system. Unfortunately, outside some scientific and other complex computing environments (like e.g. statistics, business information processing, film rendering etc.), the Grid paradigm, due to its "batch-processing" general approach, is not easily or at all directly applicable. Obviously a non-interactive environment is neither appropriate for the general public, nor for artists and scientists which deal with real-time phenomena. Out of that situation a new notion in computing paradigms and appropriate architectural structures have been born. A Cloud. The Cloud Paradigm (or, if you prefer, actually the Cloud Allegory) means organisation of such a processing structure that computing, data retention, data retrieval etc. is done somewhere-anywhere in the world by systems of clusters, and then provided to anybody needing data storage and/or processing for any reason whatsoever, by the so-called "service providers". The user interfaces changed and adapted themselves to the needs of a more general public, and a lot of cloudcompatible programmes are written to accustom many differing end-user needs. However, the Clouds themselves, as opposed to the Grids, are not user-level controllable or programmable, and therefore there is a lot of proprietary solutions, though, in the manner of a Cloud, this huge heterogeneity of the individual computers, clusters and grids of clusters inside any particular available Cloud (provided by a cloud-provider) is, or at least tends to be, much hidden from the end-user. As with all the nomenclature of different generations or system architectures mentioned, all of them started MIPRO 2016/DC VIS being implemented and used often much earlier than their names have been invented. The development process in science has in this sense a two-sided push. Something is invented, starts being used, becomes a seed for more global thought, a name and paradigm is invented, scientifically investigated and than applied to generalise the usage of such inventions. Solving problems of generalisation of a specific paradigm leads to a new circle of invention... in a kind of spiralling development. After seeing the Cloud we realised that we have to bring processing also closer to the Earth, to the user/consumer edge. So the Cloud slowly settled into the Fog. Probably the first actual fully fledged Fog paradigm idea was the development of the Postscript language - a language specifically designed for Printers, not for Computers. Actually there is no reason whatsoever (except speed) not to use your printer to calculate for example appropriate sine waves and musical envelopes for your computer resident audio generation programme. So we here immediately see the Fog Paradigm - using "lowest" level equipment to process part of the wider processing need. The adoption of Postscript for real-time calculations and rendering in the SUN NeWS and the NeXTStep display systems, which allowed to run Display Postscript applications on remote viewing foreground stations (the Display Servers) controlled and programmable from the background servers (the Display Clients), enabled splitting the processing of certain applications seamlessly into distinct computers, the central ones (the Clients) and the peripheral ones (the Servers)1. This idea, combined with the, in that time already ubiquitous, HyperText Markup Language (HTML) led to the development, again by SUN Microsystems, of the Java programming language, and the JavaScript branch. To mention is also the standardisation development done for the Wireless Application Protocol (WAP), and the associated Wireless Mark-up Language (WML), which, though in intensive use quite short lived, did define some important points for further development. Many other such user-station execution languages/environments have also been developed, like e.g. Flash etc. So nowadays the situation is that we do have a kind of primitive "operating system based on device-distributed 'intelligence'" common to all modern web-browsers, and which allows a lot of processing to be done on the enduser (i.e. hierarchically lowest processing level) equipment - from rendering properly human-readable text, through decoding video and audio streams, up to executing fully functional full-fledged important applications, or even parts of scientific crowd-supported distributed computing environment jobs. Now we are in the situation mentioned above that we have to scientifically develop this which we already 1 Please beware the terminology. A display terminal is a Server, which serves the remote requests of a possibly larger number of compute-server-type display Clients. The human looks at the output of the Server, and is mostly interested in the calculations of the Clients. MIPRO 2016/DC VIS perceive as a computing paradigm, and gave it the name of Clouds Laying Low - the Fog. In this effort we will have to solve a lot of problems to make the usage of Things (as predicted by the buzz-name Internet of Things) seamlessly coordinated through the Fog Architecture with the Clouds, Grids, Clusters and individual Computers, as to be really usable in the emerging Global Information Processing Environment. So, with the Fog we came down from the Clouds to the "intelligent" Things in our everyday environment, like our mobile communicators, or our home-entertainment television-sets, or even our refrigerators and ovens, many of which show the tendency to be connected to the Internet. The real challenge now is to find a common communication framework, information structure and language to be able to have a grasp over the uncountable possibilities of such an integrated global system - the addition of the Internet of Things and the Fog paradigm onto the already existing huge global processing infrastructure. And one could imagine a highly refined home oven helping, in free time, to calculate the ephemeris of some just discovered asteroid, or to help with its own cooking experience some newly opened restaurant on the other side of the world. Believe it or not, this is the goal our present day development is aiming at. III. WHAT DEW CAN DO However, as all cells in our body are not primarily information processors, like neurons, but the vast majority of them are physically controlling their global environment - our common human body (although also processing and exchanging information), the same applies to all of our Environment, natural or human made. The vast majority of things we see all around us have no processing power at all: the house, the car motor, the bathtub, the road or the chair. Or do they? As opposed to your internet-connected oven, which has to have quite a high processing power already due to the complexity of internet communication itself, your car motor can work without any computing equipment at all although we use presently more and more processors to fine-tune the mechanical behaviour of the motor based on environmental conditions. However, there is actually no possibility for this kind of processors to calculate PI for some scientist on the other side of the globe. But they do process a lot of environment information and generate a lot of control and status data. The road does not by the first sight seem as something which would process information. However, a lot of modern roads do sense traffic, know their surface temperature and even some of them can sense if they are covered by rain, snow, ice or mud. That is, in other words, even our modern roads collect environment data and distribute it to higher levels, primarily for traffic counting and direct traffic control if necessary. "Intelligent homes", massage chairs, hydro-massage bathtubs, all of those have more and more integrated control processors. The water and heating boilers, the airconditioners, the washing-machines... are all controlled by 177 processors gathering environment information controlling the same environment in a specific way. and This is the very Dew of Computing. And these simple computers influence our everyday physical life in an extremely strong way, with the aim to keep our immediate environment as close to our wishes as possible. Their physical influence is truly immediate and tangible, an influence on the physical quality of human life which no Computer, Cluster, Grid, Cloud or Fog device could ever dream to have. The viability and high effectiveness of the Dew computing paradigm and emerging architecture is quite obvious in many important areas of real life implementations (weather/soil conditions, traffic, medical emergency, smart home, distribution balancing, environment protection etc., etc.). The merits of seamlessly integrating "dew" devices into the Cloud - Fog - Dew Computing hierarchy are enormous, for individuals, the public and industrial sectors, the scientific community and the commercial sector, by bettering the physical and communication, as well as the intellectual, immediate human environment. In the possibility of developing integrated home management/entertainment/maintenance systems, selforganising traffic-control systems, intelligent driver suggestion systems, coordinated building/car/traffic pollution control systems, real-time hospital systems with all patient and equipment status and control collaborating with the medical staff, load balancing energy and water distribution systems, emergency systems which cooperate with the town traffic, fully consistent synaesthetic artistic performances including artists and independent individuals ("active public") from wide apart, power distribution peek filtering, self-reorganisation and mutual cooperation systems based on informed behaviour of individual power consumption elements, etc., etc. 2 , the Dew-Computing paradigm shows the way towards the Distributed Information Services Environment (DISE), and finally towards the present civilisation's aim of establishment of a Global Information Processing Environment (GIPE). IV. THE DEW The mission of Dew computing is to fully realize the potentials of personal/body and human environmentcontrol computers in symbiosis with Cloud services. However, the complexity of interconnectivity, and even more the heterogeneity of equipment used through these paradigms is drastically growing as we approach the Dew computing level. Whereas the Cloud and Fog Computing paradigms address the principles of operation on complex computations using massive amounts of data, which is 2 It is essential to understand that on the level of Dew-Computing the Human is the prime and final decision maker (it is up to a person, the human being, if it will head the warnings of the car, or if it will necessitate water even if the system is trying to reduce overall consumption!). 178 context-free, Dew computing is context-aware, giving the meaning to data being processed. Data is context-free, while information is data with accompanying meta-data. The meta-data places the data in a specific context. Information processing (data + meta-data) enables the application of self-organisation on all levels, providing a much wider scope of problems which can be solved, and also allows more comprehensive analysis and deeper knowledge discovery. The Cloud and Fog Computing paradigms operate on huge quantities of raw data generated by specific Things, via predefined services. Since the raw data is out of context, the services need to be tailored and application specific, requiring data driven decisions. Building an integrated scalable heterogeneous distributed computing environment from the level of Cloud or Fog is currently not plausible (or viable), as the lack of contextual information disallows generic integration of all processing elements. To solve the problems of everyday human/computer interaction (communication) through ergonomic and human-oriented end-user interfaces, and therefore also to allow the adaptability of the whole user environment system to human needs and wishes, Dew computing has to be based on Information-oriented processing rather than being Data-oriented. In the Dew computing scenario individual Things are responsible for collecting/generating the raw data, and are thus the only components of the computing ecosystem which are completely aware of the context the data were generated in, therefore dew-devices actually must produce and exchange Information. The idea behind the Dew computing is primarily to use the resources at the lowest level as self-organising systems solving everyday human and industrial environment problems. Significant advances and savings can be achieved by using such physical-edge computing in the areas of, for example, traffic control, power distribution (balancing, peek protection, etc.), integrated home systems, medical systems, emergency public services systems, industrial control systems (specifically those necessitating high level, e.g. Cloud, services) etc. On the physical-edge level a vast majority of Dewdevices will not be connected primarily or at all to the Internet or to the mobile phone network for communication with other global computing environment elements. As the impact of Dew-devices is primarily space-oriented, i.e. the device's actions are directly connected, by action or communication, with their immediate physical and informational surrounding, i.e. environment, for communication those devices will mostly use near-communication means as e.g. direct cabling, free line optical communication, short to medium distance radio (e.g. Bluetooth) or even electricity distribution line communication (e.g. X-10). Actually the "dew" devices are exactly on the opposite side of the High Performance Computing (HPC), we could call them Low Performance Information Processing (LPIP). HPC presupposes high data and computing rates, whereas LPIP presupposes low information and decision rates. However, just to mention, the sum of the processing power of all LPIP devices on our planet highly supersedes MIPRO 2016/DC VIS the processing power of all HPC systems, though their specialisations are on opposite sides of the human endeavour spectrum, therefore they are actually incommensurable. It is obvious from the above that Dew computing in the sense of information processing actually presupposes extremely fine granulation of parallelism. On the lowest level there are devices with slow serial processing speeds and small to extremely small data/information retention and processing possibilities. Therefore the Dew paradigm is radically different from the Fog/Cloud or Cluster/Grid paradigms, as its lowest and low level devices cannot be used in a "conventional" programmable way, but they have to cooperate on that lowest level to solve human environment problems or needs, and be able to pass (and consume) information from all hierarchical levels. Dew computing must be a generic and general architecture, which must be able to include all of the present and future hybrid and extremely heterogeneous information gatherers, information distributors, information processors, information presenters and information consumers, at the lowest level directly connected to everyday machines which are part of the common modern human physical environment, and at the highest level interconnected into the global information processing and distribution system. To achieve this it is essential to develop and define a generic communication language and a full ontology, and not just communication protocols, which are applicable in data processing environments, but not in information processing systems. Only in such a way we will be able to cope with the complexity and heterogeneity of future equipment, and be able to harness the vast possibilities which a Global Information Processing System can offer. V. CONLUSION The Dew-Computing Paradigm may well be the final missing ingredient to the computing development, transforming the all-pervading clusters, grids, clouds and fogs of computers into a human-helping Global Information Processing Environment. Solving properly many of the envisioned problems at this very beginning of a New Information Processing Era may well spare our children of coping with myriads of incompatible and pseudo-compatible systems which necessitate "higher magical skills" to get them to do what is necessary, needed or wanted. ACKNOWLEDGEMENTS This work was, in part, supported by the European Commission through the Horizon 2020: EGI Engange and INDIGO DC projects. REFERENCES [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] MIPRO 2016/DC VIS John Backus “Can Programming Be Liberated from the von Neumann Style? A Functional Style and Its Algebra of Programs”, Communications of the ACM, Vol 21, No. 8, pp. 613-641, August 1978., Borna Bebek, Zorislav Šojat: "Optimal Distribution of Organisational Resources", Diplomatic Academy Year–Book, Edition: Central European Initiative – International Conference "Diplomacy for the twenty-first century: knowledge management", Vol. 2/2000, No. 1, HR: Ministarstvo vanjskih poslova i europskih integracija; 2000 R. Buyya, C. Yeo, S. Venugopal, and J. B. I. Brandic: “Cloud Computing and Emerging IT Platforms: Vision, hype, and reality for delivering computing as the 5th utility”, Future Generation Computer Systems, vol. 25, no. 6, pp. 599–616, 2009. EGI Glossary; European Grid Initiative WIKI, https://wiki.egi.eu/wiki/Glossary_V1, 22/2/2014, 21:56 Roland N Ibbett & Nigel P Topham (1996): “HIGH PERFORMANCE COMPUTER ARCHITECTURES - A Historical Perspective”, http://homepages.inf.ed.ac.uk/rni/comparch/index.html Sinisa Marin, Robert J. Thorpe, Zorislav Sojat: “A Modular Robot Programming System”, 1989; London, GB: RDP Technology. Sinisa Marin, Mihajlo Ristic, Zorislav Sojat: “An Implementation of a Novel Method for Concurrent Process Control in Robot Programming”, Third International Symposium on Robotics and Manufacturing: Research, Education and Application, ISRAM ‘90, Burnaby, BC/CA; 1990 Janko Mršić-Floegel, Derek Reynolds, Zorislav Šojat, Marco Biancessi, Stefano Sala: Data Communications, Patent WO2002025897 A1, priority 13/9/2000, http://www.google.com/patents/WO2002025897A1, retreived 29/2/2016. John Owens "Data Level Parallelism (2)" EEC 171 Parallel Architectures, UC Davis, http://www.nvidia.com/content/cudazone/cudau/courses/ucdavis/l ectures/dlp2.pdf 1/3/2014, 6:13 Karolj Skala, Zorislav Sojat: "Towards a Grid Applicable Parallel Architecture Machine.", Computational Science - ICCS 2004, 4th International Conference, Kraków, Poland, June 6-9, 2004, Proceedings, Part III; 2004 K. Skala, Z. Sojat: "Image Programming for Scientific Visualization by Cluster Computing", Autonomic and Autonomous Systems and International Conference on Networking and Services, ICAS-ICNS; 2005; Karolj Skala, Davor Davidovic, Enis Afgan, Ivan Sovic, Zorislav Sojat: “Scalable Distributed Computing Hierarchy: Cloud, Fog and Dew Computing”, Open Journal of Cloud Computing (OJCC), 2(1), Pages 16-24, 2015, https://www.ronpub.com/publications/OJCC_2015v2i1n03_Skala. pdf Zorislav Sojat: "Operating System Based on Device Distributed Intelligence", 1st Orwellian Symposium, Baden Baden, Germany; 1984 Zorislav Sojat, Sinisa Marin: “ISOCOM 20 Filter System User Documentation: Flow Through Filter Language and Graphics Organisation Language”, 1992; BTS, Purley, GB. Zorislav Šojat: "Nanoračunarstvo i prirodno distribuirani paralelizam", Glasilo Instituta Ruđer Bošković. 3(7/8):20-22.; 2002 Zorislav Sojat, Karolj Skala: "Multiple Programme Single Data Stream Approach to Grid Programming", Hypermedia and Grid Systems, Opatija, Croatia; 2004 Zorislav Šojat, Tomislav Ćosić, Karolj Skala: "Virtue — A different approach to human/computer interaction" Information and Communication Technology, Electronics and Microelectronics (MIPRO), 2014, 37th International Convention on. IEEE, 2014, http://grgur.irb.hr/Library/Shoyat.Cyosicy.Skala.Virtue_A_differe nt_approach_to_human_computer_interaction.pdf Y. Wang, “Cloud-dew architecture,” International Journal of Cloud Computing, vol. 4, no. 3, pp. 199–210, 2015. 179 180 MIPRO 2016/DC VIS PAPERS DISTRIBUTED COMPUTING AND CLOUD COMPUTING Parameters that affect the parallel execution speed of programs in multi-core processor computers * Valon Xhafa*, Flaka Dika** University of Prishtina/Department of Computer Engineering, Prishtina, Kosovo **University of Vienna/Faculty of Computer Science, Vienna, Austria valon.xhafa@uni-pr.edu Abstract - The speed of the computers work depends directly on the number of processor cores used in the parallel execution of the programs. But, there are some other parameters that have impact on the computer speed, like maximum allowed number of threads, size of cache memory inside the processors and its organization. To determine the impact of each of these parameters, we have experimented with different types of computers, measuring the speed of their work during solving a certain problem and during image processing. To compare impacts of particular parameters on the computers speed, the results we have presented graphically, through joint diagrams for each parameter. I. INTRODUCTION Currently, on the market there are different types of computers with multi-core processors. Use of computers with these features has an important impact on their parallel work. Users have difficulties to distinguish which is the best solution for them, knowing clock speed of processor, or the information for the type of processor, like Core i3, Core i5 or Core i7 (in this case often it is assumed that we have to do with the processor that have 3, 5 or 7 cores) [5]. Programs that are written to be executed on a single processor or in a processor core without having the possibility to execute more instructions immediately, are known as serial programs, or sequential programs [1]. On the other side, programs that are written for simultaneous execution in more cores, or more logical processor, are known as parallel programs [2][13]. Parallel execution of programs in a computer is based on the parallel work of processor cores. But, using the Intel’s technology that is known as Hyper-Threading it is possible for each core to work as two logical processor [5], respectively it is possible for the processor with 4 cores to work parallel with 8 logical processors. The parallel work of microprocessors has a great influence on the calculation speed of computers which is important for the massive use of computers in communication through the Internet, also searching on the Web, in the processing of various biometric information, monitoring the climate changes in real time, scientific researches, in complicated three-dimensional games on the computer etc. Measuring the performance of computers during parallel execution of programs is not an easy task, MIPRO 2016/DC VIS because in the speed of their work, except clock speed, indicates cores number, model, the support of HyperThreading technology, organization of cache memory in separated levels [7] etc. II. EXPERIMENTAL RESULTS In order to get relevant conclusions, initially we have experimented with numerous computers that have different processor types of Intel producer. In order to measure the speed of the computer work during experimentation we have used an ordinary algorithm for multiplication of two square matrices with different dimensions, from 100 x 100 to 1000 x 1000 (10 cases). For parallelism of loops are used methods of programming language C# [3][10][12], which are included in Microsoft Visual Studio 2010 package. We have started the experimentation with computers that are built using previous processor 2 Duo, which have cache memory only on the level L2, then with Core i3, Core i5 and Core i7 processors (short i3, i5, i7) [16], whose memory are on the levels L2 and L3 and processor used for servers, such as Xeon type [15]. Finally from all the computers that are used for experimentation, we have singled out only the results of 15 computers with Intel processor, whose characteristics are given in TABLE I, without differentiating as desktop or notebook computer. TABLE I. LIST OF USED COMPUTERS . From the given table we can see that the model of the processor does not automatically determines the number of cores, neither the number of logical processors. Thus, 185 for example, computer C7 belongs to model i7 and and contains 4 cores and 8 logical processors [8]. But, his memories L2 and L3 are smaller, e.g. than at the model i5 processor of computer C2. On the other side, the computer C14 has model i7 processor, but with two cores and four logical processor [9]. At the same time, processors of earlier generation 2 Duo, which are included in the computers C9 and C12 have relatively large cache memory on the level L2, but don’t have the level L3 memory and in parallel work they have shown lower speed than computers with processors of next generation. 2. Parallel execution time During the parallel execution of the testing program in separate computers, parallel execution time (Tp) have big difference from serial execution time (Ts), which can be seen from the diagram given in Fig.2. A. Influence of number of cores To have a clearer overview in how much does the parallel execution of programs indicates, initially we have experimented with serial execution testing program for matrices multiplication, image horizontally flipping, and Object tracking by activating only one core of the processor. Then, we have measured the execution time of a parallel program when all cores are active, respectively all logical processors. Finally, using the results of measurements, we found the execution speedup and the level of parallelism of the program. Having no chance to present all the results obtained, in the following section we have given only the graphically results for four computers from the list given in TABLE I, who have clock speed approximately the same level (above 3 GHz). 1. Serial execution time Results of serial execution time (Ts) for four specific computers are shown in the diagram given in Fig.1. The diagram shows that computer C11 is faster, which uses a processor that is ranked in the group of processors that are actually fastest [8], the i7 model, with maximum clock speed of 3.60 GHz, and with 4 cores (short 4-c) and 8 logical processors (short 8-lp). Computer C2, whether it is the i5 model, has competitive speed with computer C11. This is seen in the daily ranking of Intel processors [9], where he has high position in the list. Simultaneously, the computer C6 with i5 processor is slower than computer C8 who have i3 processor. From the given diagram is clear that the curves are ranked based on the clock speed of processors. Figure 2. Parallel execution time of testing program Here it is seen that computer C11 has higher speed, and computer C6 is slower, which is consistent with their speed in serial work. But, in this case execution times are significantly shorter. Thus, for example, computer C11 multiplication of matrices with dimensions 1000 x 1000 performs for more than 2 seconds, and during serial execution is needed more than 10 seconds. Parallel execution time in computers C11 and C2 do not differ much between them, despite being different models of processors (i7 and i5). During the parallel work of computer C11 are 8 logical processor active, and in computer C2 - 4 logical processors. On the other side, as it is shown in the diagram, with increased dimensions of matrices, the speed of the computer C6 drops dramatically, despite having the i5 processor and its clock speed of 3.10 GHz. From what was said and by experimentation that we had with other computers, we can conclude that the speed of parallel execution of the program can't be determined based on the model of the processor (i3, i5 or i7), or number of cores, because it is also influenced by the clock speed of processor. Same results are achieved during image processing. In our case, we have taken an image to process by using image flipping algorithms. First of all, we have measured the execution time of the serial image processing. After that, the algorithm used for image flipping is parallelized in order to process the image in more cores. Later on, we have measured the execution time of parallel image processing too. Achieved results are shown in Fig. 3. We have experimented with computers with 1, 2, and 4 cores, Figure 3. Execution time for horizontally image flipping Figure 1. Serial execution time of the program 186 also in the figure are shown the average time of execution of horizontally image flipping. MIPRO 2016/DC VIS execution in finding the object contours. As mentioned above, object tracking applications are time-sensitive because their aim is to detect and find object position in a specific time, therefore delays between framing and contour tracking of an object should be decreased. From the diagram we can conclude that delays decrease with the parallel processing of framed image using multi-core processors. But the rate of the delay decrees can't be determined based on the model of the processor (i3, i5 or i7), or number of cores, because it is also influenced by the clock speed of processor, this is the same as during matrices multiplication. Figure 4. The application interface used in measuring the execution speed of Contour tracking algorithm In computers with processor with 1 core, we have used the serial algorithm for image flipping, thus the average time of execution is 783 ms. In computers with 2 or 4 cores processor, the parallel algorithm for image flipping is used. The execution speed are 456 ms and 234 ms, when 2 core and 4 core processors are used. Based on this, the execution speed is boosted when processors with more cores are used. The interface of the application used for testing and measuring the execution time of horizontally image flipping algorithms is shown in Fig.5. Part of the interface is also the option to choose the number of processor cores to be used during the processing. Choosing the execution mode between Parallel and Serial is also part of the interface. In another experiment we have taken the case of object tracking application which is a time-sensitive application. The application frames the video camera into images, than for every image uses Contour Tracking algorithms to detect objects contours. The application interface is shown in Fig. 4. The detected contours of an object are transferred and displayed in the Object tracking panel of the application from which is measured the relative position of the object center. During the experiments we have measured the serial execution of the Contour Tracking algorithm and the parallel execution as well. The results are shown in Fig. 6. Based on the diagram, there is as an increase of speed during parallel Figure 5. The application interface used in measuring the execution speed of image flipping algorithms MIPRO 2016/DC VIS Figure 6. The average execution time of Object Tracking algorithm in processors with different number of cores 3. Execution speedup If it's known the serial execution time (Ts) and parallel execution time (Tp) of the testing program, can be calculated execution speedup (S) through the expression [6] 𝑆= 𝑇𝑠 𝑇𝑝 (1) Using the experimental results mentioned above, execution speedup of 4 computers taken into consideration, it will look like is given in Fig. 7. Even the execution speedup maintain the sequence of curves for specific computers as well as in diagrams for execution time. But curves have variations, with increases and decreases. At the end of their, obvious that only the computer C11 continues accelerating growth (finally Figure 7. Execution speedup of the execution process 187 reaches the value of 4.5 times), while the three other computers values stay the same, or decrease. As it is seen from the diagram given, and even from experimentation that we had with other computers too, the execution speedup is greater if the number of operations that are executed (e.g. multiplication of matrices with large dimensions) increases up to a certain thresholds. During this there is no continual growth execution speedup, which shows that parallelism is not perfect. B. The impact of cache In the speed of the computer work it is important even the size of the cache memory. Models of the last generations of processors contain cache memory on the three levels. Initially we have experimented with computers of prior generations, like 2 Duo processor cores. In TABLE I we've included computers C9 and C12, whose processors have only memory L2 level and we have noticed that compared with other computers in parallel work are extremely slow, despite having relatively higher clock speeds. Then we experimented with other 6 computers also included in the TABLE I and the results obtained for them are presented in the diagram given in Fig. 8. From the given diagram it is shown that computer C7 with i7 processor is faster and has large cache memory on the third level whether that it has the lowest clock speed. Computers C2 and C1 have approximately the same speed, with processor of model i5, clock speed much greater than computer C7 and have a maximum cache memory at two levels. In the overall ranking of Here, N is the number of processors, and P represents the relative percentage of the program parallelism. Starting from the given Law, through mathematical operations can be derived expression for the calculation of the level of parallelism of a program: 𝑃= 𝑁(𝑆 − 1) 𝑆(𝑁 − 1) (3) Since in the measurement of execution speedup values are calculated using (1), and knowing the number of cores, i.e. the number of logical processors (N), the level of parallelism can be calculated through (3). Thus, for computers with i7 processor, the level of parallelism will appear as is given in Fig.9. From the diagram provided and experimentation that we have done with other computers, it is shown that the relative level of parallelism of the program execution is over 0.65 and just a bit under 1.00, which is directly linked with the speed of parallel execution of the program. Figure 9. Parallel execution time of computers with processors that have different cache memory D. The maximum rate of parallelism Figure 8. Parallel execution time of computers with processors that have different cache memory processor of computer C1 have higher rank than the computer C2 [9]. Speed of computers C8 and C6, despite that they have processors i3 and i5, in their speed ranking have higher speed clock and cache memory size. The Computer C14, with i7 processor, have the highest execution time. During the parallel work of computer, the operating system determines the number of threads that are activated [10], or being allocated as needed certain number of threads to programs that run in parallel [11]. However, this can be done by the user too, by limiting the number of threads. If parallel execution relates only to a program, computer work can be optimized if the number of threads deal as far as the number of cores, i.e. the number of logical processors. To show this we experimented with several computers. But in Fig. 10 have just given diagram of C. The level of parallelism The maximum execution speedup of program can be determined by Amdahl's Law [4][6] 𝑆= 1 (1 − 𝑃) + 𝑃 𝑁 (2) Figure 10. The rate of parallelism 188 MIPRO 2016/DC VIS parallel execution time of testing computer C4, limiting the number of threads between the values 4 and 20. The given diagram clearly shows the execution time is minimal if the maximum degree of parallelism obtained 4, which corresponds to the number of processor cores of the computer where the program is executed. Execution time increases if it’s allowed a higher degree of parallelism, which is observed in most of the computers with which we experimented. From this we can conclude that the higher speed of parallel program execution is usually achieved when the maximum degree of parallelism is taken as far as the number of processor cores. III. [11] https://msdn.microsoft.com/en-us/library/system.threading.tasks. paralleloptions.maxdegreeofparallelism%28v=vs.110%29.aspx [Accessed on 23.1.2014] [12] Gaston C. Hillar. Proffesional Parallel Programming with C#, Wiley Publishing, Inc., Indianapolis, Indiana, 2011 [13] Peter S. Pacheco. An Introduction to Parallel Programming, University of San Francisco, Morgan Kaufmann Publishers, 2011 [14] Hans-Wolfgang Loidl. Parallel Programing in C#. Heriot-Watt University, Edinburgh, 2012 [15] http://www.intel.com/content/www/us/en/processors/xeon/xeonprocessor-e7-family.html [Accessed on 18.1.2015] [16] http://www.intel.com/content/www/us/en/processors/core/5th-gencore-processor-family.html [Accessed on 15.7.2014] CONLUSION The speed of the serial execution of the program for different computers mainly is indicated directly from the processor clock speed. But the speed of parallel execution of the program can't be determined based on the model of the processor, or the number of cores, because the cache memory size and clock speed have an important impact. Execution speedup of the computer, depends on the number of processor cores, as well as the level of parallelism of the program being executed, these conclusions are achieved during the experiments of matrices multiplication as well as during image processing. But the level of a parallelism of a program execution depends directly on the speed of the parallel work of the computer. In computers with which we experimented, level of parallelism is over 0.65 and just under 1. In order to optimize the parallel speed of computer we can manipulate with the allowed number of threads. Experimental results show that if a single program is executed, the speed of computer work is usually better if the number of threads deal as far as the number of processor cores. REFERENCES [1] Blaise Barney. Intoduction to Parallel Computing. Lawrence Livermore National Laboratory. https://computing.llnl.gov/tuto rials/parallel_comp/ [Accessed on 10.9.2014] [2] Ananth Grama, Anshul Gupta, George Karypis, Vipin Kumar. Introduction to Parallel Computing, Second Edition. Addison Wesley, January 16, 2003 [3] Parallel Programming with .Net. http://blogs.msdn.com/b/pfx team/ . [Accessed on 17.8.2014] [4] Frank Willmore. Parallel Programming. The University of Texas at Austin https://portal.tacc.utexas.edu/c/document_library/get_fi le?uuid=d7f742a0-42c8-4407-b3c6-9fe1cbda14f1&groupId=13 601 [Accessed on 25.11.2014] [5] http://www.pcworld.idg.com.au/article/386100/what_difference_ b etween_an_intel_core_i3_i5_i7_/ [Accessed on 18.12.2014] [6] Jan Zapletal. Amdahl's and Gustafson's laws. VSB - Technical University of Ostrava, 2009 [7] Shared Memory Multiprocessors. http://www.inf.ed.ac.uk/teachin g/courses/ pa/Notes/lecture04-multi.pdf [Accessed on 23.12.2013] [8] http://www.pcworld.idg.com.au/article/386100/what_difference_ b etween_an_intel_core_i3_i5_i7_/ [Accessed on 16.12.2014] http://www.cpubenchmark.net [Accessed on 18.1.2015] [9] [10] Optimal number of threads per core. http://stackoverflow. com/questions/1718465/optimal-number-of-threads-per-core [Accessed on 12.3.2013] MIPRO 2016/DC VIS 189 Federated Computing on the Web: the UNICORE Portal Maria Petrova-El Sayed∗ , Krzysztof Benedyczak† , Andrzej Rutkowski† and Bernd Schuller∗ ∗ Jülich Supercomputing Centre Forschungszentrum Jülich GmbH, Germany Email: {m.petrova, b.schuller}@fz-juelich.de † Interdisciplinary Centre for Mathematical and Computational Modelling Warsaw University, Poland Email: golbi@icm.edu.pl, rudy@mat.umk.pl Abstract—As modern science requires modern approaches, vast collaborations comprising federated resources on heterogeneous computing systems rise to meet the current scientific challenges. Due to their size and complexity these computing systems become demanding and could further complicate the scientific process. As a result, scientists are overrun with the necessity of additional technical experience that lies outside their domain. UNICORE is a middleware which serves as an abstraction layer to mask the technical details and ensure an easy and unified access to data and computation over federated infrastructures. The Portal is the newest client in the UNICORE portfolio providing web access to data and computing systems. With the rising demand of having an up-to-date user-friendly graphical interface and access to computational resources and data from any device at any point in time, the Portal meets the contemporary needs of the dynamic world wide web. This paper describes the functionality of the Portal and its advantages over other clients. It also discusses security and authentication methods offered by the Portal and presents use cases and client customisation from our practice. It concludes with ideas about future work and extension points for further simplification for the scientific use. I. I NTRODUCTION Complex scientific projects involve multiple computer simulations and calculations. In order to achieve their goals, scientists need to benefit from heterogeneous computing systems and resources, which can be distributed all over the world. The working process in such an environment is very challenging for scientists with limited technical background. UNICORE is a middleware software that ensures a smoother path for researchers, allowing them to concentrate on the actual scientific problem by reducing the amount of required technical knowledge. The development of UNICORE [1] dates back to 1996 and was initiated in an effort to provide access to the three largest German High-Performance Computing (HPC) centres. Its core design principles comprise: abstraction of resource-specific details, openness and extensibility, security, operating system independence and autonomy of resource providers. The software is available as open source from the SourceForge repository [2] under a commercially friendly licence. The traditional end-user clients in the UNICORE portfolio are the UNICORE Commandline Client (UCC) and the desk- 190 top UNICORE Rich Client (URC). However, in the current era of technological advancement the world wide web is the main means to enable access to data. In an answer to contemporary demands, the UNICORE Portal provides a webbased Graphical User Interface (GUI). It offers not only a userfriendly GUI for work with distributed computing systems, but also serves as an access point to compute centres. Web technologies have multiple advantages, for instance, constant availability from an arbitrary location. The remainder of this paper is structured as follows. Section II describes the functionality of the Portal. Supported authentication methods are covered in section III, which includes a concise presentation of the new Unity [3] component that plays a crucial role in many of the new development and integration scenarios. Section IV focuses on client customisations and extensions from our practice. In section V we discuss some advantages of the Portal by making a high-level comparison between the different UNICORE clients as well as by comparing the Portal to other existing web solutions. Finally, in section VI we summarise and conclude with future work. II. F UNCTIONALITY OF THE UNICORE P ORTAL The UNICORE Portal facilitates computation in federated systems and makes the grid more accessible. It is also used as a convenient access point to a particular compute centre. The functionality of the Portal covers generic use cases. Before being able to compute or manage data via the Portal, the user needs to authenticate himself at login. More information on available authentication methods is presented in section III. After successfully logging in, the user is directed to the home page of the Portal. The home page is to a great extent configurable by an administrator and can contain either a Really Simple Syndication (RSS) feed by choice, or a personalised design from a local Hypertext Markup Language (HTML) file. The Portal allows multiple customisations (done by an administrator only) like hiding certain views, branding the GUI, adding and exchanging logos and much more. MIPRO 2016/DC VIS A. Job Submission Focus points of the Portal functionality are execution of jobs, fetching of results and presentation of outcomes. By clicking on New Job from a flat navigation menu, the user is forwarded to a view for job submission (Fig. 1). Here he can select an application from a pre-filled list, which is being dynamically updated. The information about the applications in this list is fetched from the configuration files of the UNICORE sites which are accessible to the user. In this view, job descriptions and application specific input parameters can be entered in a graphical form. The latter would have normally been put into a script and manually submitted for execution. A GUI client eliminates the need to look into a detailed job script by automatically generating it in a ready to be submitted state. The user has to only prepare the necessary data and set the desired arguments. He can as well upload files from a remote or local storage. When clicking on the corresponding button, a window with the accessible storages will be open. Then the user can browse a selected storage and navigate through its folders in a manner resembling work within a file system. He also has the possibility to create new files where he can write necessary algorithms or other required input information. Most of the local or remote files can be changed on the fly through a file editor. However, restrictions on the size prohibit opening and editing of too large files from within the Portal. A stage-out destination of objects to a desired output location can also be specified for every job. For example, the user can define that the job output be copied to a remotely located folder of choice. Already in this view the user can see which sites he has access to and can choose where to send his job for execution. If none selected, the UNICORE server will take the decision based on a thorough brokering algorithm, which takes into account the needed resources, the load of the sites, etc. The required resources for the successful completion of his job can also be specified by the user in a comfortable GUI form. Furthermore, multiple jobs can be edited at the same time in multiple windows. Full job definitions can be saved and restored, thus allowing for parameter fitting and re-submission. B. Monitoring The monitoring of the job results is represented in a dedicated job table on a separate view. The table contains the job name, status, submission time, tags and other details. The update for each status change or a new job submission happens automatically. The user can sort, filter and search the table entries, as well as re-submit, abort or delete jobs. He can also browse the working directory. If the job’s execution completed successfully, the working directory contains all job related files including the fetched output results. These can normally be viewed in the browser or downloaded to the local machine. As expected, the user can monitor and operate only with his own data and only after a successful authentication. In case of an unsuccessful job execution, the working directory stays empty as no output has been produced. However, the MIPRO 2016/DC VIS Fig. 1. The view for preparation and submission of generic jobs. user can view the reason for failure by clicking on a button which opens a window with more detailed information. C. Data Manager The Portal offers a powerful data manager which assists users in data-related tasks. The intuitive look & feel ensures a comprehensive GUI for transfer, upload or download of data within different locations. Users are able to create, preview and delete files in the browser as well as edit their content on the fly. The data manager supports both client-server upload/download and server-server transfers with several available data transport protocols. High throughput protocols such as UNICORE File Transfer Protocol (UFTP) [4] are also included. D. User Workspace Another typical characteristic is the so called user workspace. It enables the storage of data directly in the Portal in order for them to be accessible from anywhere on the web. For example, the data from submitted jobs are being preserved in the workspace. Each user has his own separate workspace, which is always created by default. However, for security reasons the Portal administrator has the possibility to hide the workspace from the user, thus restricting his access. If access is granted, the user has full control over the data from his workspace with the help of the data manager. 191 E. Workflows The Portal supports the creation and submission of elementary workflows. With a few mouse clicks the user can compose a graph of jobs with the required connections, parameters, imports and exports. The workflow editor is meant to be simple and intuitive and serves only the basic needs, i.e. only task sequences. All the advanced features, such as conditional and loops, are currently omitted. Identical to the job table, there is a separate table, containing all submitted workflows and their details. The user can browse through a workflow’s working directory in a manner resembling browsing through a file system. F. Sites view The sites view contains information about all the accessible to the user sites. The user can track each site’s availability, the number of its cores, its installed applications and more. If the geographic location is configured, it will be marked on Google Maps for a clear representation of the infrastructure. The sites view is just informative and has no influence on computing. G. Configurations The Portal is designed to be flexible and allows for multiple configurations. All settings including authentication modes, accessible compute centres, graphical customizations of logos, home page, main menu entries and more are defined in a properties file. This file is managed by a so called Portal administrator who is responsible for the initial set up of a Portal instance. The end user normally does not have access to the configuration file and does not need to know or manage the Portal instance itself. His working process starts from the authentication point. H. Internationalisation Last but not least, the Portal supports internationalisation. The default language is English. However, adding a new language to the GUI is uncomplicated and does not require any programming effort. All the messages are defined in properties files which can be translated. With a simple configuration from a Portal administrator, the new language will appear for selection together with all available languages at the login screen. III. S ECURITY AND AUTHENTICATION The certificates of the X.509 Public Key Infrastructure system are commonly used by grid middleware as a base of authentication and trust delegation, needed to coordinate distributed job execution. This approach turned out to be a major obstacle in grid adoption: certificates are difficult to obtain, distribute, refresh and manage. Conversion between different formats, installation in clients, etc. are problematic as well. Web scenarios are even more troublesome. The Portal is a middleware client and is accessed itself by the user’s browser. Therefore, the user’s private key cannot be retrieved by the grid client—i.e. the Portal. Even if the user installs a certificate in 192 the browser, it can only be used for the authentication step. As browsers do not support grid-specific trust delegation, which is needed by the grid, this has to be handled differently. The UNICORE Portal approach to authentication and trust delegation addresses the aforementioned issues. It preserves backwards compatibility and does not scarify an overall security. Authentication in the Portal is designed in a modular way. Each authentication module provides support for a different authentication mechanism. In general these mechanisms can be grouped into two categories: local mechanisms and remote mechanisms. The local authentication is handled fully by the Portal. Currently implemented local authentication modes comprise: • An X.509 certificate—pre-installed in the user’s web browser. • User name and password—for simple installations as it lacks advanced features such as password reset. • Demonstration account—a shared pseudo account that can be enabled for demonstrative access. In order to use the UNICORE infrastructure after any of the local authentication mechanisms, the user is prompted to upload a so called trust delegation token to the Portal. The only exception is the demonstrative account. This is a solution to the delegation problem, as the private key stays at the user’s machine. The drawback is that it requires the user to start a Java Web Start application (served by the Portal) and provide X.509 credentials to this application. The application generates a time-limited delegation token and uploads it to the Portal. This extra step is executed roughly once a month depending on the user’s trust in the Portal. Nevertheless, it can be seen as an obstacle for portal users as it requires possession of a certificate and a potentially difficult action. To offer first class authentication, the UNICORE Portal also supports a remote authentication over the Security Assertion Markup Language (SAML) protocol. The SAML authentication using the web POST profile is described in the OASIS SAML Standard [5]. Besides the authentication itself, support for an additional trust delegation assertion was added. Such assertion must be generated by a universally trusted third party service and returned to the Portal together with the SAML authentication assertion. The role of a trusted delegation and authentication authority is fulfilled by a Unity service [3]. Unity is a powerful identity provider, featuring support for outsourcing authentication to different external identity providers, using both OAuth and SAML protocols. Through Unity, a middleware can be configured with an arbitrary login mechanism. Unity can enable one or more from the standard solutions such as password, authentication with data centre Lightweight Directory Access Protocol (LDAP), home Identity Provider (IdP) of a SAML federation, etc. It can even use a social identity provider like Google, Facebook or Github to name a few. It should be underlined that the configuration of the selected authentication mechanisms is done by a Portal administrator and is transparent for the end user. Thus the choice of allowed authentication sources is fully controllable and the Portal’s MIPRO 2016/DC VIS authentication uses UNICORE trust delegation without any tricks like generation of short lived certificates or storing user’s private key in a web server. IV. U SE CASES AND USER CUSTOMISATION A. Generic use case Already in its default configuration, the UNICORE Portal can be used for a wide variety of use cases. Access to a federated infrastructure of HPC resources is one of the primary design goals. As a concrete example for this use case, we would like to mention the Human Brain Project (HBP) [6], a European FET flagship project. The HBP targets a wide range of topics, aiming at a deeper understanding of the human brain and attempting to leverage this understanding for new technological advancements. The HBP operates on HPC resources, integrating major supercomputing sites in Jülich, Barcelona, Bologna and Lugano as well as cloud storage and other resources. The HBP uses a single-sign-on system based on OpenID Connect (OIDC) [7]. With UNICORE deployed as underlying middleware, the UNICORE Portal is used as a simple way to access the HPC and data resources. The Portal is configured to use Unity for authentication, and is thus integrated with the single-sign-on functionality. The user benefits already in this simple scenario from the abstraction provided by UNICORE and the single-sign-on functionality; it is no longer required to know and understand how to log into the various resources, or be aware of the various batch schedulers that are used at the various sites (e.g., Slurm and IBM LoadLeveler). The Portal is used to prepare and submit so called generic jobs. The user interface for these generic jobs is generated at runtime based on metadata provided by the UNICORE server. B. Custom plugins The generic approach is sufficient for the majority of applications, especially those whose users were traditionally preparing input files in some well-established format. However, there are cases when a more sophisticated interface is welcomed. For example, some applications require special resource allocations such as CPU, number of nodes, used memory or architectures, which might all depend on other application parameters. A specialised application interface can assist with these complex resource settings. Customisations to the interface can also improve the preparation of input. For instance, selecting parts of the image for a visual analysis is much more convenient when clicking on the image itself than by providing numeric pixel coordinates. Even though the generic application description is quite powerful and allows for expressing the most common dependencies, some complex relationships between input settings may require special implementation. Furthermore, for many applications output visualisation is a key feature and its proper placement in the Portal is crucial for effective research. As the above aspects were known from the very beginning, the core Portal is designed as a base for development of applications or domain specific plugins. A plugin is free to MIPRO 2016/DC VIS implement an arbitrary user interface as well as to contribute with one or more entries to the main menu. The administrator has control over all menu entries and can select which of them to be shown on the GUI. This allows for tailoring of the Portal to particular domain needs even to a degree where a generic job interface is not presented at all. The development of a portal plugin is restricted to the base technology of the Portal, which includes Vaadin [8] as the underlying framework for building user interfaces in Java and Spring [9] as a container for objects. These restrictions are counterbalanced by the possibility to use internal portal components and features to speed up the plugin development. For example, multiple graphical elements such as the job table can be re-used from the existing implementation. Already present are also the various user authentication methods as well as a grid security context for grid interaction like asynchronous discovery of resources and jobs. Thus, the development can be focused purely on application domain aspects. We present here two examples of such custom modules developed in PL-Grid Plus and PL-Grid NG projects. The first one is named SinusMed. SinusMed processes an input, that being a series of computed tomography (CT) images, which form a 3D image of a human head. It detects all air-filled head areas. Areas are subsequently marked and categorised using reference data sets prepared by medical doctors. Results of the simulation comprise several sets of layered masks on the input image, which must be presented in a visual form. Another example is the AngioMerge application. It is used to synchronise several angiograms, which are taken periodically with an interval of several minutes. The output is supposed to be displayed in 3D, optionally as an animation. The portal modules for SinusMed and AngioMerge allow for a simplified job preparation, which focuses on the actual application input and automatically pre-sets the required resources for each application. Therefore, the user is not burdened with additional knowledge about the computing infrastructure being used. What is more, applications submit the jobs with the help of a UNICORE broker so that site selection is fully automated. The most attractive features of the above application specific interfaces are found in the output visualisation part. The SinusMed directly shows the original CT images in the browser (Fig. 2), enabling the overlay of synchronised masking layers, computed by the application. The AngioMerge module embeds a WebGL 3D interface (Fig. 3) allowing for interactive viewing of the output and even playing the resulting animation. V. C OMPARISON WITH OTHER SOLUTIONS A. With Other UNICORE Clients The UCC is a commandline tool that enables job and workflow management from a shell or scripting environment [10]. It covers all features of the UNICORE service layer including data transfers, job and workflow submission and monitoring, results fetching, etc. Among its characteristic qualities is the possibility to submit multiple jobs in an automated fashion via a batch mode. To speed up the work, new customised commands can be defined by the user. However, the UCC is 193 Fig. 2. Results of sinuses detection embedded in the Portal interface after a simulation in a low quality (fast) mode. Fig. 3. Output visualisation of the AngioMerge module directly in the portal web interface. oriented towards experienced computer users, who find working with a terminal faster and more convenient. In contrast, there are scientists who welcome an easy to use GUI. The URC [11] is a standalone client that provides a detailed graphical interface for the UNICORE functionality. Similar to UCC, the URC supports all features like job and workflow creation and submission, retrieval of results, data transfers and grid browsing. In its essence the software is based on Eclipse [12], which makes it easily extensible via the plugin mechanism. Specific to the URC are the different perspectives, which hide or show features from the UI in accordance to how advanced the user is. The major strength of this client is its powerful workflow editor with constructs like loops and conditional statements for steering the control flow and enabling a fully automated work process. In comparison to the URC, the functionality of the Portal’s workflow editor is elementary. We have taken this approach due to feedback from our users. In order to use the advanced structures, many users still need the assistance of a UNICORE specialist. This is one of the main reasons why we decided to keep the design of the Portal’s workflow editor simple and light-weight. Thus, we ensure a comprehensive and intuitive interface that covers the basic need, whereas the complex use cases can benefit from the well-developed and well-established editor of the URC. Federated computing on the web has multiple advantages to desktop clients. Unlike the UCC and the URC, the Portal is available from any location and can be accessed by various devices. While standalone clients require installation, administration and regular updates done by the user himself, a web portal releases the user of this responsibility, thus saving time and effort. B. With Other Existing Web Clients 194 There are several existing web portals that concentrate on HPC. One such solution is the InSilicoLab [13]. It is an environment built to assist researchers in their specific domain. In principle it has been designed to be generic, but it is currently used only by projects in chemistry [14]. Similar to the UNICORE Portal, this tool offers access to distributed resources on the web, job preparation, submission and monitoring as well as fetching of results and data management. However, the InSilicoLab has rather specialised into domainspecific interfaces. In contrast, the Portal is more generic with the intention to be an easily extensible solution that can be beneficial in various research domains. The Portal is also more flexible with its numerous authentication possibilities through the integration with Unity. Another web client for HPC is Adaptive Computing’s Moab Viewpoint [15]. The Moab Viewpoint is solely concentrated on job and data management and does not cover any workflow features. However, it offers an interface for creating application templates, which helps in optimizing application run times and reduces errors. Another enticing feature, which is not present in the UNICORE Portal, is the existence of two GUI modes: an end user and an administrator one. The administrator mode enables supplementary functionality like reporting on resource utilisation, troubleshooting of the workload, editing or cancellation of jobs and tracking of node-usage. The Portal can be customized by an administrator through a properties file but this should not be confused with the administrator mode from Viewpoint. The administration in the Portal is not job based and offers no GUI. It concerns the enabled options like languages, authentication methods, menu entries, etc. for a particular Portal installation. One drawback of MIPRO 2016/DC VIS the Moab Viewpoint is that it is a commercial software, which means extra costs incurring for the users. Furthermore, Viewpoint does not operate in a distributed environment, but only supports work on a singular site. VI. C ONCLUSION The UNICORE Portal is an open source solution that facilitates scientific work by enabling federated computing on the web. It offers a convenient and friendly GUI for job and workflow submission, monitoring and data management. The authentication to the Portal is done either locally or with the help of Unity, which allows for using numerous authentication mechanisms, including social and federated ones, without the need to possess an X.509 certificate. The default Portal implementation covers the generic use case, which is sufficient for most projects. We presented as an example the integration of the Portal in the HBP platform. The Portal is also flexible and, due to its modular design, it is easily extensible by self-developed plugins for specific applications. The creation of a new plugin requires solely the effort that is necessary to add the particularities of the targeted domain, whereas the base portal components and features can be easily re-used. We described two use cases with custom modules, namely the SinusMed and the AngioMerge applications. We also made a concise comparison between the Portal and the other clients in the UNICORE portfolio with their strong points and disadvantages. We discussed related web solutions for HPC environments. The Portal’s development does not stop here. It faces multiple milestones on its future path. One of them, which is under current development, is the workflow template feature. The Portal will offer the option to import a ready template file. The latter will be parsed and a familiar graphical form will be automatically generated. Then the user will need to fill the data and submit the updated workflow without having to compose a new one from scratch. This approach will ensure a comfortable parameter fitting, simple re-use of workflows and uncomplicated workflow submission. Another idea that is also in progress is the integration with data sharing solutions. Scientists will be able to share with others their files, such as workflow templates, from within the Portal’s interface. R EFERENCES [1] A. Streit, P. Bala, A. Beck-Ratzka, K. Benedyczak, S. Bergmann, R. Breu, J. M. Daivandy, B. Demuth, A. Eifer, A. Giesler, B. Hagemeier, S. Holl, V. Huber, N. Lamla, D. Mallmann, A. S. Memon, M. S. Memon, M. Rambadt, M. Riedel, M. Romberg, B. Schuller, T. Schlauch, A. Schreiber, T. Soddemann, and W. Ziegler, “UNICORE 6 – recent and future advancements,” Annals of telecommunications-annales des télécommunications, vol. 65, no. 11-12, pp. 757–762, 2010. [2] “UNICORE Open Source project page,” [accessed: 2016-02-01]. [Online]. Available: http://sourceforge.net/projects/unicore/ [3] “Unity project website,” February 2016, [accessed: 2016-02-01]. [Online]. Available: http://www.unity-idm.eu [4] B. Schuller and T. Pohlmann, “UFTP: High-performance data transfer for UNICORE,” in Proceedings of 7th UNICORE Summit 2011, ser. IAS Series, no. 9. Forschungszentrum Jülich GmbH, 2011, pp. 135–142. [5] J. Hughes, S. Cantor, J. Hodges, F. Hirsch, P. Mishra, R. Philpott, and E. Maler, “Profiles for the OASIS Security Assertion Markup Language (SAML) V2.0 OASIS Standard,” 3 2005, [accessed: 2016-02-01]. [Online]. Available: http://docs.oasis-open.org/security/saml/v2.0/ [6] “Human Brain Project,” [accessed: 2016-02-01]. [Online]. Available: http://www.humanbrainproject.eu/ [7] “OpenID Connect,” [accessed: 2016-02-01]. [Online]. Available: http://openid.net/connect [8] “Vaadin framework website,” [accessed: 2016-02-01]. [Online]. Available: https://vaadin.com/ [9] R. Johnson, J. Höller, K. Donald, C. Sampaleanu, R. Harrop, T. Risberg, A. Arendsen, D. Davison, D. Kopylenko, M. Pollack et al., “The spring framework–reference documentation,” Interface, vol. 21, 2004. [10] “UNICORE commandline client: User Manual,” [accessed: 2016-02-01]. [Online]. Available: http://unicore.eu/documentation/manuals/unicore/files/ucc/uccmanual.html [11] B. Demuth, B. Schuller, S. Holl, J. Daivandy, A. Giesler, V. Huber, and S. Sild, “The UNICORE Rich Client: Facilitating the automated execution of scientific workflows,” in e-Science (e-Science), 2010 IEEE Sixth International Conference on, Dec 2010, pp. 238–245. [12] “The Eclipse Foundation open source community website,” [accessed: 2016-02-01]. [Online]. Available: https://eclipse.org/ [13] J. Kocot, T. Szepieniec, D. Har˛eżlak, K. Noga, and M. Sterzel, “Insilicolab – managing complexity of chemistry computations,” in Building a National Distributed e-Infrastructure–PL-Grid. Springer, 2012, pp. 265–275. [14] A. Eilmes, M. Sterzel, T. Szepieniec, J. Kocot, K. Noga, and M. Golik, “Comprehensive support for chemistry computations in PL-Grid infrastructure,” in eScience on Distributed Computing Infrastructure. Springer, 2014, pp. 250–262. [15] “Adaptive computing: Viewpoint portal,” [accessed: 2016-02-01]. [Online]. Available: http://www.adaptivecomputing.com/products/hpcproducts/viewpoint/ ACKNOWLEDGEMENT This work was made possible with assistance of the PL-Grid Plus project, contract number: POIG.02.03.00-00-096/10, and the PL-Grid NG project POIG.02.03.00-12-138/13, website: www.plgrid.pl. The projects are co-funded by the European Regional Development Fund as part of the Innovative Economy program. The research leading to these results has also received funding from the European Union Seventh Framework Programme (FP7/2007-2013) under grant agreement no. 604102 (Human Brain Project). MIPRO 2016/DC VIS 195 Problem-Oriented Scheduling of Cloud Applications: PO-HEFT Algorithm Case Study E.A. Nepovinnykh and G.I. Radchenko *South Ural State University, Chelyabinsk, Russia nepovinnykhea@susu.ru, gleb.radchenko@susu.ru Abstract - Today we see a significantly increased use of problem-oriented approach to the development of cloud computing environment scheduling algorithms. There are already several such algorithms. However, a lot of these require that the tasks within a single job are independent and do not account for the execution of each task and the volume of data transmitted. We propose a model of problem-oriented cloud environment. Using this model, we propose a list-based algorithm of problem-oriented planning of execution of applications in a cloud environment that considers the applications' execution profiles, based on a Heterogeneous Earliest-Finish-Time (HEFT) algorithm. Keywords – scheduling, execution planning, cloud computing, grid computing, HEFT I. INTRODUCTION Today a lot of complex e-Science tasks are solved using computer simulation which usually requires significant computational resources usage [1]. Moreover, the solutions, developed for such tasks are often characterized by structural complexity, which causes different resources (informational, software or hardware) to be integrated within a single solution. The complexity of the solutions grows as the multidisciplinary tasks are considered. Today’s common approach for building composite solutions is based on Service-Oriented Architecture [2] which forms the basis from interconnection of services and hiding their complexity behind their interfaces. Interconnection of the services within complex tasks is usually implemented in a form of workflow structures, which exploits graph-based structures to describe interconnection of used services. On the other hand, today the Cloud Computing concept is developed as a business framework for providing on-demand services supporting computing resources’ consolidation, abstraction, access automation and utility within a market environment. The service-oriented architecture in the cloud is best implemented using the microservice approach. The microservice model describes a cloud application as a suite of small independent services, each running in its own container and communicating with other services using lightweight mechanisms. These services are built around separate business capabilities, independently deployable and may be written by different development teams using different programming languages and frameworks [3]. to To provide scientists and engineers a transparent access the computing resources a “Problem Solving The reported study was partially supported by RFBR, research project No. 14-07-00420-a and by Grant of the President of the Russian Federation No. МК-7524.2015.9 196 Environment” (PSE) concept is commonly used. A PSE is a system that provides all the computational facilities necessary to solve a target class of problems. It uses the language of the target class and users need not have specialized knowledge of the underlying hardware or software [4]. At present, PSE researchers are investigating a variety of fields, e.g., Cloud computing support, education support, CAE usage support, document generation support, and so on. Today most of the systems that provide a problemoriented approach to e-science problems on the basis of high performance computing resources use workflows to organize a computational process [5]. Nodes of such workflows represents separate tasks implemented by individual services, and the edges define the data or control flow. In this paper, under the “Problem Solving Environment” term we would understand a set of services, software and middleware focused on the implementation of workflows to solve e-Science problems in a specific problem domain, using resources of cloud computing system [6]. Within a problem domain of PSE, a set of tasks, forming the workflow, is predetermined. Those tasks can be grouped into a finite set of classes. Task class is a set of tasks that have the same semantics and the same set of input parameters and output data. On the one hand, this imposes restrictions on the class of problems that can be solved using the PSE. On the other hand, such restriction allows to use a domain-specific information (such as task execution time on one processor core, scalability limits, and the amount of generated data) during resources allocation and scheduling, increasing the efficiency of use of available computational resources. So, in order to increase efficiency of distributed problem-oriented computer environments it is feasible to use problem-oriented task scheduling methods that use domain-specific information in order to predict computational attributes of a particular workflow. The main goal of the research is to develop a scheduling algorithm for a workflow-based problem-solving environment, which would effectively use a domainspecific information (such as task execution time, scalability limits, and the amount of data transfer) for prediction of cloud computing environment resources load. This paper is organized as follows. In section II we present the concept and the basic idea of scheduling applications in cloud environments. In section III we MIPRO 2016/DC VIS describe the cloud-based problem solving environment model. In section IV we describe HEFT and PO-HEFT cloud scheduling algorithms complete with a mathematical task model. In section V we describe the implementation of PO-HEFT algorithm in Workflow Sim’s cloud environment simulation package. In section VI we summarize the results of our research and give further research directions. II. SCHEDULING APPLICATIONS IN CLOUD ENVIRONMENTS Analysis of the main trends in resource scheduling research in distributed problem-oriented environments shows that the theme of the problem-oriented scheduling and prediction of environment load is an urgent task. In the cloud computer data centers, Holistic Model for Resource Representation is used in virtualized cloud computing data [7]. This model is designed to represent physical resources, virtual machines, and applications in cloud computing environments. The model can be applied to represent cloud applications, VMs, and physical hosts. Each of these entities is described by multiple resources: computing, memory, storage, and networking. A holistic model increases the precision of cloud environment simulation and enables a number of new simulation scenarios focused on heterogeneity of the hardware resources and virtualization. The model distinguishes between computing, memory, storage, and networking types of resources. However, the model can easily scale to include other types of resources as well, e.g., additional GPGPU units. New cloud-related techniques for resource virtualization and sharing and the corresponding service level agreements call for new optimization models and solutions. Computational Intelligence proves to be applicable to multiple resource management problems that exist at all layers of Cloud computing. Standard optimization objectives for scheduling are to minimize makespan and cost, but additional objectives may include optimization of energy consumption or communications. Solutions to this multi-objective optimization problem include but are not limited to: Improved Differential Evolutionary Algorithm combined with the Taguchi method, Multi-Objective Evolutionary Algorithm based on NSGA-II, Case Library and Pareto Solution based hybrid GA Particle Swarm Optimization, Auction-Based Biobjective Scheduling Strategy etc. [2]. The main drawback of mentioned algorithms is the fact that they do not use information about previous executions. The main reason that traditional cluster and grid resource allocation approaches fail to provide efficient performance in clouds is that most of cloud applications require availability of communication resources for information exchange between tasks, with databases or the end users [10]. CA-DAG model for cloud computing applications, which overcomes shortcomings of existing approaches using communication awareness. This model is based on Directed Acyclic Graphs that in addition to computing vertices include separate vertices to represent communications. Such a representation allows making separate resource allocation decisions: assigning processors to handle computing jobs, and network resources for MIPRO 2016/DC VIS information transmissions. A case study is given and corresponding results indicate that DAG scheduling algorithms designed for single DAG and single machine settings are not well suited for Grid scheduling scenarios, where user run time estimates are available. For practical purposes quite simple scheduler MaxAR with minimal information requirements can provide good performance for multiple workflow scheduling [11]. In real Grid environments this strategy might have similar performance comparing with the best ones when considering approximation factor, mean critical path waiting time, and critical path slowdown. Besides the performance aspect the use of MaxAR does not require additional management overhead such as DAG analysis, site local queue ordering, and constructing preliminary schedules by the Grid broker. It has small time complexity. This approach is related with offline scheduling which can be used as a starting point for addressing the online case. Online Grid workflow management brings new challenges to above problem, as it requires more flexible load balancing workflows and their tasks over the time. Nowadays the shifting emphasis of clouds towards a service-oriented paradigm has led to the adoption of Service Level Agreements (SLAs) [12]. The use of SLAs has a strong influence on job scheduling, as schedules must observe quality of service constraints. In terms of minimizing power consumption and maximizing provider income Min-e outperforms other allocation strategies. The strategy is stable even in significantly different conditions. The information about the speed of machines does not help to improve significantly the allocation strategies. When examining the overall system performance on the real data, it is determined that appropriate distribution of energy requirements over the system provide more benefits in income and power consumption than other strategies. Mine is a simple allocation strategy requiring minimal information and little computational complexity. Nevertheless, it achieves good improvements in both objectives and quality of service guarantees. However, it is not assessed its actual efficiency and effectiveness. One of the most popular algorithms is scheduled listbased algorithm Min-min [13]. Min-min sets high scheduling priority to tasks which have the shortest execution time. The main drawback of scheduled list-based algorithms is that they do not analyze the whole task graph. One of the important classes of computational problems is problem-oriented workflow applications executed in distributed computing environment [14]. A problemoriented workflow application can be represented by a directed graph whose vertices are tasks and arcs are data flows. Problem-oriented scheduling (POS) algorithm is proposed. The POS algorithm takes into account both specifics of the problem-oriented jobs and multi-core structure of the computing system nodes. The POS algorithm is designed for use in distributed computing systems with manycore processors. The algorithm allows one to schedule execution of one task on several processor cores with regard to constraints on scalability of the task. Cloud computing can satisfy the different service requests with different configuration, deployment condition and service resources of various users at different 197 time point. With the influence of multidimensional factors, it is unreality to test with different parameters in actual cloud computing center. Typical Tools for Cloud Workflow Scheduling Research are CloudSim and WorkflowSim [9]. CloudSim is a toolkit (library) for simulation of cloud computing scenarios. It provides basic classes for describing data centers, virtual machines, applications, users, computational resources, and policies for management of diverse parts of the system (e.g., scheduling and provisioning). WorkflowSim extends the CloudSim simulation toolkit by introducing the support of workflow preparation and execution with an implementation of a stack of workflow parser, workflow engine and job scheduler. WorkflowSim is used for validating Graph algorithm, distributed computing, workflow scheduling, resource provisioning and so on. Compared to CloudSim and other workflow simulators, WorkflowSim provides support of task clustering that merges tasks into a cluster job and dynamic scheduling algorithm that jobs matched to a worker node whenever a worker node become idle. In the following sections, we would present a new problem-oriented resource-scheduling algorithm for distributed computing environments, which uses heuristic score-based approach based on the HEFT algorithm for the task of the problem-oriented scheduling in cloud environments. III. One particular feature of a problem-oriented computing environment is the fact that said environment uses information about task classes’ features during scheduling and resource provisioning. We require that every task class should have these functions defined for prediction of task execution process depending on input parameters: 1) output data volume estimation function; 2) task execution time estimation function on a machine with a given performance characteristics values vector. To implement the problem-oriented scheduling, let us define two operators, which should be implemented in the PSE: 1) CLOUD-BASED PROBLEM SOLVING ENVIRONMENT MODEL Let us define a model for cloud problem solving environment, so we could simulate the task scheduling algorithm. We would work in the conditions, where a set 𝔐 of virtual machines 𝔪 ∈ 𝔐 was distributed between all available nodes 𝔫 ∈ 𝔑 of cloud platform. Let us define a virtual machine image performance factor as: π: 𝔪 → ℤ*+ , where 𝔪 is a virtual machine image. Numerical characteristics of a virtual machine image, synthetic tests results or existing functions’ test execution results can serve as examples of such performance characteristics [15, 16]. In order to maximize the quality of task characteristics prediction on specified machine, we need to take into account several performance characteristics, including such characteristics as the number of available processors and memory; CPU frequency; hard drive data exchange speed; LINPACK test results and so on. Thus, let us define a vector Π of the performance characteristics of virtual machines deployed in the cloud-based PSE: Π = π0 , π1 … π3 . Each machine 𝔪 ∈ 𝔐 in a cloud-based PSE is comparable to the performance characteristics of the vector Π, which reflects the values of the performance of the machine: Π: 𝔐 → 198 Let’s define a set of tasks, that can be executed in a PSE as a set ℱ of functions f ∈ ℱ. Each function f: 𝒞 89 → 𝒞 :;< receives n information objects ℐ 89 = I089 , … , I989 of classes 𝒞 89 = C0 , … , C9 . The result of the function is m new :;< information objects ℐ :;< = I0:;< , … , IB of classes C . We assume that in our model, each 𝒞 :;< = C0C , … , CB task of a workflow is allocated to one virtual machine. Direct access to components of the computing system is not provided. ℤ3*+ . The operator of the expected output 𝜈(𝑓, ℐ GH ) – it is the operator that returns the expected total size in bytes of output data objects ℐ IJK for the function 𝑓: 𝒞 GH → 𝒞 IJK : 𝜈 𝑓, ℐ GH = |ℐ IJK |. 2) The operator of the expected function’s execution time 𝜏 𝑓, ℐ GH , Π , that returns the estimated run time (in seconds) of a function 𝑓: 𝒞 GH → 𝒞 IJK for a given set of input data ℐ GH on a given machine, with the performance characteristics vector Π: 𝜏: 𝑓, ℐ GH , Π → ℕ. Execution time of a function f: 𝒞 89 → 𝒞 :;< on a given machine with a performance values vector Π can be defined as an operator that takes input information objects vector ℐ 89 . Unfortunately it is impossible to estimate a function execution time with absolute accuracy due to the fact that the computations involved in output information objects preparation ℐ :;< might indirectly depend on multiple factors that our model does not account for, including, but not limited to, background processes, available cache volume, branch prediction rate, etc. In order to take into account this inherent inaccuracy, execution time estimate can be modelled as a random value that is a sum of two parts: 𝜒(𝑓, Π, ℐ GH ) = 𝜏 𝑓, Π, ℐ GH + 𝛼, where τ f, Π, ℐ 89 – a deterministic function that represents a dependency of execution time of the function f that is running on a computer with a performance values vector Π on input information objects vector ℐ 89 , α – a stochastic value with the expected value (M α = 0), that represents factors that our model does not account for. In conditions, when a specific function (task) of PSE is implemented on the basis of a virtual machine with a predefined performance characteristics vector Π+ , to evaluate MIPRO 2016/DC VIS the expected value E χ(f, Π+ , ℐ 89 ) we can use the k-nearest neighbor method, based on records of execution time of the previous launches of function f on the same machine with close values of input parameters: 𝐸 𝜒(𝑓, Π+ , ℐ GH ) = 𝜏 𝑓, Π+ , ℐ GH = 1 𝑘 c b (1) ℐ^^_ ,`a 𝑊G ℐ GH 𝑡 Gd0 where τ f, Π+ , ℐ 89 is a weighted average execution time estimation for the function f with the ℐ 89 input parameters on the basis of k previous observations of the execution time t iℐgh,` of the function f with the input parameters ℐ889 a g on a machine with performance characteristics vector Π+ . The weighting function W8 ℐ 89 assigns greater weight to the function execution time records, where the values of the input parameters are closer to the ℐ 89 . To take into account a possibility of execution of the function f on the virtual machine with the characteristics vector that is different from Π+ , we can extend the definition of the evaluation parameter vector, adding to the vector ℐ 89 of the input parameters the virtual machine performance vector Π = π0 , π1 … π3 . Thus, we assume that as a characteristic we use the vector P of dimension n + r, where n - number of input parameters of the function f, r - the number of virtual machine performance characteristics, defined as follows: 𝑃 = [I0GH , … , IHGH , 𝜋0 , … , 𝜋p ]. In this case, (1) can be transformed into the following form: GH 𝐸 𝜒(𝑓, Π, ℐ ) = 𝜏 𝑓, Π, ℐ GH 1 = 𝑘 c b 𝑊G 𝑃 𝑡r^ , Gd0 t isg where is the value of the previous observation of execution time of the function f with the evaluation parameter vector P8 . Similarly, we can estimate the amount of output data for the function f: 𝜈 𝑓, ℐ GH 1 = 𝑘 c 𝑊G ℐ GH Gd0 b 𝑣 ^_ , ℐ^ where ν f, ℐ 89 is an estimate value of the volume of output parameters of the function f on the basis of information on output volumes of k previous runs of the same function with the ℐ889 input parameters, using the weighting function W. IV. HEFT ALGORITHM FOR THE PROBLEM-ORIENTED SCHEDULING We offer a list-based algorithm for problem-oriented scheduling in cloud environments based on their computing profiles. List-based scheduling involves the definition of computational units' priorities and starting the execution according to the received priority. The binding to highpriority tasks resources takes place first. The proposed approach allows us to take into account the costs of transmission of data between nodes, thereby reducing the total time of execution of the workflow. The proposed MIPRO 2016/DC VIS algorithm is based on an algorithm of Heterogeneous Earliest-Finish-Time (HEFT), but contains modifications during the node level computation phase, and takes into account the problem of calculating the incoming communication value of its parent task [17]. Let Tw − be the size of the problem Tw , and the R be the set of computing resources with an average processing R power R = 98d0 8 n. Then, the average time to complete the task with all available resources is calculated as 𝑇{ (2) 𝑅 Let Tw} be the amount of data transferred between tasks Tw and T} , and R be the set of available resources with an R average capacity of data transfer R = 98d0 ~ n. Then the average score on data transfer costs between tasks Tw and T} for all pairs of p. 𝐸 𝑇{ = 𝑇{€ (3) 𝑅 Thus, the priority calculation unit may be defined as 𝐷 𝑇{€ = 𝑟𝑎𝑛𝑘 𝑇{ = 𝐸 𝑇{ + + max (𝐷 𝑇{€ + 𝑟𝑎𝑛𝑘(𝑇€ )) (4) †‡ ∈ˆJ‰‰(†Š ) where 𝑠𝑢𝑐𝑐(𝑇{ ) is the set of tasks that depend on the task 𝑇{ . Thus, the task priority is directly determined by the priority of all its dependent tasks. Assign tasks to the resources as follows: a task with a higher priority if all the tasks on which it depends, is appointed to the computing resource, providing less time for the task [18]. Taking into account the specifics of the problemoriented cloud computing environment following modifications apply to this algorithm. Let F be the set of all functions that can be implemented in the subject area. Then a separate problem Tw is a function fw ∈ F with a set of input data objects ℐ 89 = I089 , … , I989 : 𝑇{ = 𝑓{ (ℐ{GH ), We define R as the set of available for the deployment virtual machines with mean production capacity H H 𝑅G ΠG 𝑅 = = . 𝑛 𝑛 Gd0 Gd0 In this case, for evaluating the execution time we can apply the following formula: 𝐸 𝑇{ = 𝜏 𝑓{ , 𝑅 , ℐ{GH (5) where τ fw , R , ℐw89 is an average execution time estimation for the function fw on a set of machines with mean production capacity R with the set of known values of input parameters ℐw89 . The model of problem-oriented services should take into account the amount of data returned by each task Tw . This may be used by the operator of the expected output ν(f, ℐ 89 ), which returns the expected total size in bytes of output data objects ℐ :;< . Consequently, within the framework of problem-oriented model for the evaluation of 199 data transmission time between two tasks the following estimation can be used: 𝐷 𝑇{€ = 𝜈 𝑓{ , ℐ{GH ∗ 𝑅{€ , (6) where Rw} is the bandwidth of data transmission channel in the cloud computing system. During the execution of task it can be estimated as one of the following values: 1) 𝑅 = 0, when the data transmission channel consists of a single node; 2) 𝑅 = 𝛽‘pIJ’ , when the data transmission channel is shared by a group of nodes; 3) 𝑅 = 𝛽‰“JˆK”p ,, when the data transmission channel is shared by a cluster of compute nodes. Figure 1 shows the pseudo-code for algorithm of problem-oriented workflow scheduling in a cloudcomputing environment based on computing profiles. PROCEDURE: PO-HEFT INPUT: TaskGraph G(T, E), TaskDistributionList, ResourcesSet R BEGIN for each t T from task graph G Approximate task execution time according to (5) for each e E from task graph G Approximate data transfer time according to (6) Start the width-first search in reverse task order and calculate a rank for each task according to (4) while T has unfinished tasks TaskList <- get completed tasks from task graph G Schedule Task (TaskList, R) Update TaskDistributionList END PROCEDURE: Schedule Task INPUT: TaskList, ResourcesSet R BEGIN Sort TaskList in reverse task rank order for each t from TaskList r <- get resource from R that can complete t earlier schedule t on r update status of r END Fig. 1. Problem-oriented heuristic scheduling algorithm PO-HEFT V. ALGORITHM IMPLEMENTATION AND PERFORMANCE EVALUATION In order to assess the proposed algorithm’s efficiency, we had to develop a benchmark using Workflow Sim cloud environment simulation platform. We have implemented the PO-HEFT algorithm itself, as well as a naive brute force algorithm that finds and ideal scheduling solution. The algorithm was implemented as a number of Java classes so that Workflow Sim can use it as the simulated cloud environment’s scheduler. We have implemented both a custom DatacenterBroker in order to schedule VMs in a data center and a custom CloudletScheduler in order to schedule tasks (cloudlets in CloudSim’s and Workflow Sim’s terminology) in a single VM. 200 The algorithm was tested in a simulation in which virtual machines with homogeneous characteristics have been deployed. The simulated system was given the same work flow 60 times, which greatly exceeds the capacity of the system. For the distribution of the workflow we have used: a scheduler that does not use the information about the previous system runs that is built in Workflow Sim itself, the perfect scheduler, which implements the ideal scheduling through complete search space enumeration and a scheduler based on the PO-HEFT algorithm, which uses information about previous runs. The computational complexity of the perfect scheduler does not allow its usage in any non-trivial simulation and, therefore, this algorithm is not present in this comparison. We have also tested several algorithms such as plain HEFT, particle swarm optimization and genetic algorithm and this will be a topic for further research. We plan to implement the developed cloud system and model DAG, POS and Min-min algorithms behavior in order to assess their efficiency. VI. CONLUSION In this article, we assessed current scheduling algorithms and defined a model that allows to evaluate various cloud computing environment metrics for problemoriented scheduling. Next, we described the PO-HEFT scheduling algorithm, which aims to provide workflow scheduling in heterogeneous cloud environments. The main distinctive feature of this algorithm is it's ability to adapt the solution based on previous runs, which allows this algorithm to provide better resource utilization. The algorithm's efficiency was assessed in the CloudSim with help of Workflow Sim extension cloud environment simulation software. As a benchmark we used CloudSim's built-in scheduler called "space-shared scheduling policy" which uses round-robin for resource provisioning and virtual machines creation. Our proposed algorithm have shown significant efficiency gains over this simple scheduler. As a further development we will investigate the possibility of deploying this algorithm at a real cluster in order to assess its real-life, non-simulated performance. We will also compare this algorithm against different algorithms that do not use information about previous runs in order to give an empirical prove that this is a viable heuristic in workflow scheduling. We plan to extend the algorithm in order to schedule not only tasks on machines but also to schedule machine provisioning on virtual nodes. REFERENCES [1] S. V. Kovalchuk, P. A. Smirnov, K. V. Knyazkov, A. S. Zagarskikh, and A. V. Boukhanovsky, “Knowledge-based expressive technologies within cloud computing environments,” Adv. Intell. Syst. Comput., vol. 279, pp. 1–11, 2014. [2] D. I. Savchenko, G. I. Radchenko, and O. Taipale, “Microservices validation: Mjolnirr platform case study,” 2015 38th Int. Conv. Inf. Commun. Technol. Electron. Microelectron. MIPRO 2015 - Proc., pp. 235–240, 2015. [3] J. Thones, “Microservices,” IEEE Softw., vol. 32, no. 1, pp. 116–116, 2015. [4] H. Kobashi, S. Kawata, Y. Manabe, M. Matsumoto, H. Usami, and D. Barada, “PSE park: Framework for problem solving environments,” J. Converg. Inf. Technol., vol. 5, no. 4, pp. 225–239, 2010. MIPRO 2016/DC VIS [5] E. DEELMAN, D. GANNON, M. SHIELDS, and I. TAYLOR, “Workflows and e-Science : An overview of workflow system features and capabilities,” FGCS. Futur. Gener. Comput. Syst., vol. 25, no. 5, pp. 528–540. [6] A. Shamakina, “Brokering service for supporting problem-oriented grid environments,” UNICORE Summit 2012, Proc., vol. 15, pp. 67– 75, 2012. [7] M. Guzek, D. Kliazovich, and P. Bouvry, “A Holistic Model for Resource Representation in Virtualized Cloud Computing Data Centers,” IEEE Int. Conf. Cloud Comput. Technol. Sci., pp. 590–598, 2013. [8] P. Bouvry and B. Service, “Review Article A Survey of Evolutionary Computation for Resource Management of Processing in Cloud Computing,” no. may, pp. 53–67, 2015. [9] C. Chen, J. Liu, Y. Wen, and J. Chen, “CCIS 495 - Research on Workflow Scheduling Algorithms in the Cloud,” Ccis, vol. 495, pp. 35–48, 2015. [10] D. Kliazovich, J. E. Pecero, A. Tchernykh, P. Bouvry, S. U. Khan, and A. Y. Zomaya, “CA-DAG: Modeling Communication-Aware Applications for Scheduling in Cloud Computing,” J. Grid Comput., 2015. [11] A. Hirales-Carbajal, A. Tchernykh, R. Yahyapour, J.-L. GonzalezGarcia, T. Roblitz, and J. M. Ramirez-Alcaraz, “Multiple workflow scheduling strategies with user run time estimates on a Grid,” J. Grid MIPRO 2016/DC VIS Comput., vol. 10, no. 2, pp. 325–346, 2012. [12] A. Tchernykh, L. Lozano, U. Schwiegelshohn, P. Bouvry, J. E. Pecero, S. Nesmachnow, and A. Y. Drozdov, “Online Bi-Objective Scheduling for IaaS Clouds Ensuring Quality of Service,” J. Grid Comput., 2015. [13] J. Yu, R. Buyya, and K. Ramamohanarao, “Workflow Scheduling Algorithms for Grid Computing,” Springer Berlin Heidelb., vol. 146, pp. 173–214, 2008. [14] L. B. Sokolinsky and A. V. Shamakina, “Methods of resource management in problem-oriented computing environment,” Program. Comput. Softw., vol. 42, no. 1, pp. 17–26, 2016. [15] J. J. Dongarra, P. Luszczek, and A. Petite, “The LINPACK benchmark: Past, present and future,” Concurr. Comput. Pract. Exp., vol. 15, no. 9, pp. 803–820, 2003. [16] “WPrime Systems. Super PI. 2013.” [Online]. Available: http://www.superpi.net/ . [Accessed: 14-Nov-2015]. [17] H. Topcuoglu, S. Hariri, and I. C. Society, “Performance-Effective and Low-Complexity,” Parallel Distrib. Syst. IEEE Trans., vol. 13, no. 3, pp. 260–274, 2002. [18] S. Chen, Y. Wang, and M. Pedram, “Concurrent placement, capacity provisioning, and request flow control for a distributed cloud infrastructure,” p. 279, 2014. 201 Towards a Novel Infrastructure for Conducting High Productive Cloud-Based Scientific Analytics ∗ Peter Brezany∗ , Thomas Ludescher† and Thomas Feilhauer‡ Research Group Scientific Computing, Faculty of Computer Science, University of Vienna, Vienna, Austria and SIX Research Centre, Brno University of Technology, Brno, Czech Republic Email: peter.brezany@univie.ac.at † Department of Computer Science, University of Applied Sciences, Dornbirn, Austria Email: thomas@ludescher.at ‡ Department of Computer Science, University of Applied Sciences, Dornbirn, Austria Email: thomas.feilhauer@fhv.at Abstract—The life-science and health care research environments offer an abundance of new opportunities for improvement of their efficiency and productivity using big data in collaborative research processes. A key component of this development is e-Science analytics, which is typically supported by Cloud computing nowadays. However, the state-of-the-art Cloud technology does not provide an appropriate support for high-productivity e-Science analytics. In this paper, we show how productivity of Cloud-based analytics systems can be increased by (a) supporting researchers with integrating multiple problem solving environments into the life cycle of data analysis, (b) parallel code execution on top of multiple cores or computing machines, (c) enabling safe inclusion of sensitive datasets into analytical processes through improved security mechanisms, (d) introducing scientific dataspace–a novel data management abstraction, and (e) automatic analysis services enabling a faster discovery of scientific insights and providing hints to detect potential new topics of interests. Moreover, an appropriate formal productivity model for evaluating infrastructure design decisions was developed. The result of the realization of this vision, a key contribution of this effort, is called the High-Productivity Framework that was tested and evaluated using real life-science application domain addressing breath gas analysis applied e.g. in the cancer treatment. Keywords-productivity model, scientific analysis, scientific studies, cloud computing, security, breath gas analysis I. I NTRODUCTION e-Science refers to the modern large scale science that is increasingly being carried out through distributed global collaborations enabled by the Internet. Typically, a feature of such collaborative scientific enterprises is that they require access to very large data collections and very large scale computing resources back to the individual user scientists. A key component of this development is e-Science analytics, a dynamic research field, that includes rigorous and sophisticated scientific methods of data preprocessing, integration, analysis, and visualization. Unlike traditional business analytics, e-Science analytics has to deal with huge, complex, heterogeneous, and very often geographically distributed datasets, the volume of which is already 202 measured in petabytes. Because of the huge volume and high-dimensionality of data, the associated analytics tasks are typically data and compute intensive; therefore, they are characterized together as a resource-intensive analytics. Moreover, a high level of security has to be guaranteed for many analytical applications, e.g., in finance and medical sectors. So far, the main focus has been on the functionality of Cloud-based analytics systems, and not on the productivity aspects associated with their development, use and impacts on expected scientific discoveries. In the high performance computing domain, a special US DARPA program and national programs in several countires have been focused on providing a new generation of economically viable high productivity computing systems for national security and for the industrial and research user communities. We believe that a similar effort is needed in the Cloud research domains, especially, e-Science analytics. In scientific analysis, when time-consuming calculations are performed, the execution time can be improved by using more and more CPU power and calculating the results in parallel, but the costs for execution will often be increased with the reduced time consumption. Evaluating the productivity helps companies and research institutes to achieve the best possible output with minimal costs and time. Our research presented in this paper addresses exactly this issue. Its main contribution is a formal model enabling to evaluate and predict key productivity paramenters associated with the development and execution of modern scientific analysis software systems. Furthermore, we address several other potential sources enabling to increase the productivity and provide appropriate original solutions. For better understanding, we provide several example scenarios. The productivity model and the different approaches are combined in the High-Productivity Framework (HiProF) that we designed and implemented. Its prototype software, documentation, users guide, and a test infrastructure are available on the Internet. MIPRO 2016/DC VIS Figure 1. Research Questions and the HiProF Response. During our research, while developing a high productivity infrastructure for breath gass analysis called the ABACloud1 [1], we came across two major research questions that are shown in Figure 1 in the HiProF context. The first question “How can the productivity be estimated?” is very important to evaluate the effect of decisions (e.g., increase number of worker nodes to achieve the results in less time). Without a generally usable model, these changes cannot be measured or used to make the best decision in terms of increasing the overall productivity. The second question “How can the productivity be improved by technology?” deals with several different ways to increase the overall productivity in scientific analysis. The Security Concept, Code Execution Framework, Dataspace Concept, and Automatic Analysis Framework are four of the most promising contributions, which are used in different sample scenarios introduced in the subsequent paragraphs. The Security Concept compares the productivity influence with and without using suitable security mechanisms. The Code Execution Framework (CEF) enables researchers to execute different Problem Solving Environment (PSE) codes in parallel in a cloud-based infrastructure, even if the corresponding PSE is not installed locally. We proposed and realized the concept 1 It primarily supports the development and reproducibility of scientific studies in the context of Problem Solving Environments, such as MATLAB, R, and Octave and Cloud-enabled workflows. MIPRO 2016/DC VIS of scientific dataspace that involves and manages numerous and often complex types of primary, derived and background data in an intelligent way that is based on the Semantic Web principles. The Automatic Analysis Framework (AAF) can be used to pre-analyze newly collected data to detect dependencies or data quality issues. As a next step, the AAF can be used to find new insights into existing research areas or even find new topics of interest. All these components (Formal Productivity Model, Security Concept, Dataspace Concept, Code Execution Framework, Automatic Analysis Framework) are part of HiProF. Developers of high-productivity infrastructures can select the HiProF technologies and integrate them in their own infrastructures. All provided technologies can be used optionally. If private data (e.g., patient data) are involved and user interactions in the infrastructure must be traced down to the lowest level of system (e.g., database), the provided cloudenabled security concept simplifies the development of the infrastructure to fulfill such a requirement. Installing and using the CEF in a high-productivity infrastructure enables researchers to execute problem solving environment codes (MATLAB, R, and Octave) in a cloud-based infrastructure. The CEF focuses on saving time and cost to increase the productivity. The AAF supports researchers during their daily work if continuous data is collected. Existing algorithms should be performed at these new data. Furthermore, the developer or administrator of the infrastructure can use the provided formal productivity model to make infrastructure decisions (e.g., size of the private cloud infrastructure). Depending on the requirements of the new infrastructure, the model can be simplified by reducing unnecessary cost variables (e.g., remove the license costs if only open source software is used or the required licenses are already available and do not generate additional costs). The rest of the paper is organized as follows. Section Background and Related Work contains a well known definition of productivity and the related work. Section Formal Productivity Model presents a general formula to calculate the productivity. The expected productivity improvements of the most promising approaches are discussed theoretically afterwards in Section Productivity Model Usage Scenarious. Finally, we briefly summarize the results achieved in Section Conclusions. II. BACKGROUND AND R ELATED W ORK This section provides a general definition of the term productivity and the related work addressing productivity in the field of computer science. A very common and general definition of productivity is listed at [2]: “Productivity is a measure relating a quantity or quality of output to the inputs required to produce it.” Productivity plays an important role in almost every discipline. It can be used to generate a better result, to reduce the accruing costs, to evaluate changes of infrastructure used, 203 to evaluate the impact of the quality of produced final results (e.g., scientific discoveries) on the appropriate domains. Productivity has been originally informally defined in economics2 [3] as the amount of output per unit of input, or, in other words, productivity is a ratio of production output to what is required to produce it (input). E.g.: In a computer factory the productivity can be calculated by the number of computers divided by the working hours needed, or inversely the number of working hours to assemble a computer. The productivity can be increased by rising the output or decreasing the working hours needed. This can exemplarily be done by (a) increasing the skills or motivation of the employees, (b) providing better working conditions (e.g., equipment), or (c) providing an automated plant. In the year 2004, a series of ten papers about the topic High Performance Computing (HPC) productivity was published in the International Journal of High Performance Computing Applications [4]. The different authors share their point of views on this topic. H. Zima arranged the workshop “High-Productivity Programming Languages and Models” [5] in 2004. The goal of this workshop was to design a new high-productivity language system and to find a consensus on a set of common research strategies. Figure 2. Different utility functions depending on time; modified from [6] This figure shows different time-dependent utility functions. The users of the productivity model can choose the best fitting function for their need. III. F ORMAL P RODUCTIVITY M ODEL J. Kepner [6] provides the following productivity formula: Productivity = Ψ = Utility U(T) U(T) = = Costs C c(T) (1) 2 The terms productivity and economic purpose partly the same goals. The term economic mainly tries to reduce the costs. The term productivity additionally includes the required time and the result/output. 204 U (T ) represents the time dependent utility function, which is described in Section Utility. C stands for the total costs. The total costs can equally be described by a function with the required time (c(T )). The costs are further described in Section Costs. T is the vector of all relevant time values (e.g., maintenance time, execution time, etc.). This formula is used in this paper to illustrate its support for decision making and improving the overall productivity. Utility The Utility is the value a specific user or organization defines on getting a certain answer in a certain time [6]. In general, the utility function is time dependent. A computer manufacturer has a higher benefit if the production process of a computer takes a couple of minutes instead of several hours. Furthermore, the benefit increases if the quality is improved or a more powerful computer is produced. Figure 2 shows different time dependent utility methods. Its part Figure 2(a) shows a continuously decreasing utility function. This function can be used for many different application areas. Let’s just imagine that you are waiting for a bus. If the bus arrives exactly at the same time as you arrive at the bus station you have the highest benefit — if you must wait for a while, the benefit will continuously decrease. Figure 2(b) shows a step utility method. This kind of curve can, for example, be used for a weather forecast simulation. If the simulation takes longer than the time for which the weather should be predicted, your output is useless, because you already have seen the real weather. Figure 2(c) shows a constant function that can exemplarily be used in a nontime critical scientific analysis. Figure 2(d) shows a multi step curve, for example a 3D movie production needs to render their scenes after all changes have been made. The highest benefit is, when the user can see the result right away or after a short coffee break. If he/she needs to wait over night, the utility decreases, but at least the work can be continued at the next working day. It is useless if the user must wait for example one week to see the output of his/her changes and in this case the utility is zero. A step function can be the right choice for a researcher who works with scientific analysis. If the analysis process needs only a couple of seconds, the researcher has the highest benefit. If the computation needs some hours, the utility decreases, but at least the researcher has access to the results on the next working day. Nevertheless, the benefit is probably zero if the complete analysis needs much longer (e.g., years). Depending on the given problem, it is very important to choose the right utility function. Costs The total costs can be divided into total software costs CS , ownership costs CO , hardware/machine costs CM , personnel costs CP , and data costs CD . We can express the same costs dependent on the time (cS +cO +cM +cP +cD )×T , where cx MIPRO 2016/DC VIS are the costs per unit of time and T is a vector of all relevant time values. The software costs can further be divided into license costs CL for licensed software and full development costs CD if it is a self developed software (CS = CL + CDEV ). The ownership costs contain the energy costs CE , costs for buildings CB , and maintenance costs CM (CO = CE +CB +CMA ). Dealing with data usually implies additional costs. These costs accumulate during (a) data gathering CGA (e.g., collecting sensor data), (b) data generation CGE (e.g., generating a random time series), data storage CST (e.g., store and backup data), data transfer CT (e.g., transmit data to the Amazon S3), and data transformation or conversion CC (e.g., convert a HDF5 file [7] to a CSV file), that result in CD = CGA + CGE + CST + CT + CC . Ctotal ctotal Ctotal = CS + CO + CM + CP + CD = cS + cO + cM + cP + cD = ctotal × T process can be almost impossible (e.g., if you need the date of birth and the patient’s address for further analysis) or it can be time-intensive which increases the total costs. If a researcher is not allowed to access all relevant data, the availability of the input data decreases which determines the quality of the output as well. Both disadvantages decrease the total productivity. The data quality, data volume, or data availability related utility function can exemplarily be a continuously increasing function or normal step function (Figure 3). The step function can be selected, for example, in the following scenario. If security is available, all data can be used and the utility value is maximized, otherwise only a subset of the data is allowed to be used and the utility value is reduced. (2) (3) (4) The formula for the total productivity: Ψ = Ψ = U (T ) CS + CO + CM + CP + CD U (T ) (cs + cO + cM + cP + cD ) × T (5) (6) The last two overall formulas can be used to calculate the productivity in computer science. These costs contain all accumulated costs (e.g., development, usage, maintenance, license, data). Depending on the given scenario several particular costs are zero or can be neglected. This model can be used to calculate the productivity in resource-intensive scientific analysis (data-intensive and/or compute intensive). IV. P RODUCTIVITY M ODEL U SAGE S CENARIOS The following sections describe how productivity can be measured and increased by the means of several example scenarios. Using a Security Concept The utility method does not only depend on the time, as shown in Section Utility. The quality of the output depends on the quality, volume, and availability of the input data as well. The poor quality results in a lower valuable output of the data analysis (utility function) and the additional costs increase the total costs [8]. Both affect the overall productivity negatively. In many e-Science studies personal related data is involved. The privacy of this data must be assured and all data has to be stored in a secure place during its whole lifetime. Without security, either this sensitive data must be anonymized or the researcher is not allowed to use this private data for the analysis. The original sensitive data is not available for the analysis procedure, which possibly affects the volume of the available input data. The anonymization MIPRO 2016/DC VIS Figure 3. Different utility functions, depending on data quality, data volume, or data availability This figure shows different data-dependent utility functions. These functions depend on data quality, data volume, or data availability. It is hard to determine a precise rate for the utility in practice. The utility method of e-Science studies can be defined depending on the availability (with security mechanism) of the input data and the time (U (T ime, Data) = U (t, D)). To simplify the diagram, a continuously decreasing utility function for the time dependencies (see Figure 2(a)) and a step function for the data quality (see Figure 3(b)) is used. Figure 4 shows the total utility function of the supposed scenario. The highest output can be reached if the result is available as soon as possible and if the complete data is available. Nevertheless the security overhead increases the total U↑ 3 costs as well (Ψ = C↑ ). The security overhead (e.g., Kerberos overhead) mainly depends on the infrastructure used (e.g., network connection, execution time). The following example shows how the productivity can be compared with and without security in general. To calculate the productivity threshold of a system with and without security, we defined formula 7. For simplicity, we assume that the security overhead influences all costs equally (e.g., personnel costs, energy costs). U (t, D) is the utility function without security features, US (t, D) describes the function with security. tS represents the time additionally required for the security (overhead). 3 The up arrows indicate that the expected value increases. On the other hand, a down arrow would indicate that the expected value decreases. 205 Using a cloud-based data analysis infrastructure Figure 4. Total utility function In this figure the utility function depends on time (continuously decreasing) and on the data (step function). U (t, D) US (t + tS , D) < ctotal · t ctotal · (t + tS ) Formula 7 can be rewritten as: (7)   US (t + tS , D) tS < t −1 (8) U (t, D) If we assume that (a) the utility value with security is larger than the utility function without security US (t + tS ) > U (t) and (b) the execution time is much higher than the overhead of the security t >> tS , the productivity of the secure system is always better than the productivity of the insecure system. In e-Science studies, where sensitive data (e.g., personal related data) is involved, the productivity increases when security is provided and more available data can be used for analysis. Without security concept, the data must be copied by hand which is time-intensive and error-prone. The extra time increases the personnel costs and invalid data decreases the utility function. Both disadvantages affect the productivity in a negative way. More details of a security concept we have developed are presented in [9]. Dataspace Concept Our dataspace system [10] involves and manages the primary data captured from the application, data derived by curation and analytics processes, background data including ontology and workflow specifications, semantic relationships between dataspace items based on scientific research ontologies, and available published data. Additionally, it provides advanced querying mechanisms about the contents and relationships in the dataspace and enables productive reproducibility of scientific studies. Our formal productivity model can be applied to this HiProF component in a similar way as we showed it in the previous subsection. 206 A cloud-based data analysis infrastructure enables researchers to execute different problem solving environment (PSE) codes (e.g., MATLAB, R, and Octave code) in parallel in the cloud. The Code Execution Framework (CEF) [13] component of the HiProF is a kernel system, which is used in such an infrastructure. The same PSE method can be executed with different parameter sets in parallel (embarrassingly parallel problems). The CEF can be installed at a local Eucalyptus infrastructure, Amazon EC2, or every other Amazon compatible cloud-based infrastructure. Different infrastructures can be combined to use a hybrid cloud (e.g., local Eucalyptus for general usage, for high workloads additional EC2 worker nodes are used). Furthermore, it is possible to combine different PSE types within one single analysis (e.g., MATLAB code can be transmitted for execution at the CEF with the R-client and vice-versa). In [12] various scenarious are analyzed in details using our formal productivity model. It is proven there that using the CEF results in a faster execution and the productivity increases. Automatic Analysis System An automatic analysis system helps researchers to automatically receive information about existing data. For example, the Automatic Analysis Framework (AAF) [11] uses classification, prediction, and clustering algorithms to automatically analyze research data. The AAF is built upon the workflow management system Taverna, which is widely used in many different domains and uses the data management system, based on the Dataspace Concept [10]. It helps researchers to (a) find new topics of interest or (b) gain new insights in their existing research area. Figure 5 shows a general workflow from this automatic analysis. The AAF can be continuously or in regular intervals executed by the CEF (e.g., once per week). Input Data Data Selection Linear Methods Neural Network Report Generation Data Preparation ... Principal Component Analysis Data Analysis Result Presentation Figure 5. General AAF Workflow This data mining workflow is divided into (a) Data Preparation, (b) Data Analysis, and (c) Result Presentation section. For calculating the productivity, it is important to know that (a) all automatic analysis calculations have lowest priority, meaning that they will only be executed at the CEF MIPRO 2016/DC VIS if no other calculation is waiting, and (b) the output will be evaluated and ranked automatically, depending on the percentage of the highest number of correctly classified test samples. The productivity of the AAF can be calculated as follows: Ψ = U (T ) ↑ CS + CE ↑ +CB + CMA ↑ +CM + CP ↑ (9) The AAF uses the CEF only when no other calculation is waiting. Otherwise it waits until the CEF is running idle. Therefore the software costs (CS ), building costs (CB ), and machine costs (CM ) are zero. The energy costs CE and the maintenance costs of the AAF increase, because of the additional usage and further AAF configurations. After the analysis, a researcher must evaluate the result which means that the personnel costs (CP ) are rising. The system automatically ranks the result which allows the researcher to skip non-promising results. Ψ = U (T ) ↑ CE ↑ +CMA ↑ +CP ↑ (11) V. C ONCLUSIONS This paper introduces a novel productivity model that can be used in resource-intensive scientific analysis software development processes. Productivity plays an important role to achieve the best results in terms of time, costs, and quality. Our model is an extended productivity formula from J. Kepner with additional cost variables (personnel costs, license costs, energy costs, building costs, maintenance costs, and data costs). These adaptions are necessary in order to be able to calculate the productivity of time-intensive and data-intensive scientific analysis. We defined different approaches to increase the overall productivity for these types of application. Furthermore, we explained the usage of the productivity model with several example scenarios. The scenarios are mathematically described and the threshold formulas are provided. With these formulas and examples, an administrator can evaluate system adaptions (e.g., change the infrastructure by adding additional worker nodes, using faster computers, using a hybrid Code Execution Framework, etc.) and determines how the changes affect the productivity. The presented framework was tested and evaluated using real life-science application domain addressing breath gas analysis applied e.g. in the cancer treatment. MIPRO 2016/DC VIS The research leading to these results has received funding from the Austrian Science Fund (Project No. TRP 77-N13) and the Czech National Sustainability Program supported by grant LO1401. R EFERENCES [1] I. Elsayed et al., “ABA-Cloud: support for collaborative breath research,” Journal of Breath Research, vol. 7, no. 2, pp. 026 007–026 007, 2013. [Online]. Available: http://eprints.cs.univie.ac.at/3981/ [2] About.com Economics, “Productivity - Dictionary Definition of Productivity,” http://economics.about.com/od/ economicsglossary/g/productivity.htm, Accessed July 2013. [3] OECD, “Measuring Productivity - OECD Manual,” http://w ww.oecd-ilibrary.org/content/book/9789264194519-en, 2001. [4] Internation Journal of High Performance Computing Applications, “Table of contents, winter 2004, 18 (4),” http://hpc.sagepub.com/content/18/4, 2004. (10) To increase the productivity, only the utility value must be larger than zero. The project leader must predict the utility function. If the project leader expects a valuable output of the system, the productivity will be increased when using the AAF in comparison to using no automatic analysis. U (T ) > 0 =⇒ Ψ > 0 ACKNOWLEDGMENT [5] H. Zima, “Workshop on high-productivity programming languages and models,” http://www.cs.illinois.edu/ homes/wgropp/bib/reports/hpl report.pdf, 2004. [6] J. Kepner, “High Performance Computing Productivity Model Synthesis,” International Journal of High Performance Computing Applications, vol. 18, no. 4, pp. 505–516, 2004. [Online]. Available: http://hpc.sagepub.com/content/18/4/505.abstract [7] The HDF Group, “ADF Group HDF5,” http://www.hdfgroup.org/HDF5, Accessed Dec 2012. [8] D. M. Strong et al., “Data quality in context,” Commun. ACM, vol. 40, no. 5, pp. 103–110, May 1997. [9] T. Ludescher, T. Feilhauer, and P. Brezany, “Security Concept and Implementation for a Cloud Based E-science Infrastructure,” 2012 Seventh International Conference on Availability, Reliability and Security, vol. 0, pp. 280–285, 2012. [10] I. Elsayed and P. Brezany, “Dataspace support platform for e-science,” Computer Science, vol. 13, no. 1, 2012. [Online]. Available: http://journals.agh.edu.pl/csci/article/view/14 [11] T. Ludescher et al., “Towards a high productivity automatic analysis framework for classification: An initial study,” in Advances in Data Mining. Applications and Theoretical Aspects, P. Perner, Ed. LNCS, Springer, 2013, vol. 7987, pp. 25–39. [12] T. Ludescher, “Towards High-Productivity Infrastructures for Time-Intensive Scientific Analysis,” Ph.D. Thesis, Faculty of Computer Science, University of Vienna, Austria, 2013. [13] T. Ludescher, T. Feilhauer, and P. Brezany, “CloudBased Code Execution Framework for scientific problem solving environments,” Journal of Cloud Computing: Advances, Systems and Applications, vol. 2, no. 1, p. 11, 2013. [Online]. Available: http://www.journalofcloudcomputing.com/content/2/1/11 207 An OpenMP runtime profiler/configuration tool for dynamic optimization of the number of threads Tamara Dancheva, Marjan Gusev, Vladimir Zdravevski, Sashko Ristov Ss. Cyril and Methodius University, Faculty of Computer Science and Engineering 1000 Skopje, Macedonia Email: tamaradanceva19933@gmail.com, {marjan.gushev, vladimir.zdraveski, sashko.ristov}@finki.ukim.mk Abstract—This paper describes the implementation and the experimental results of a tool for dynamic configuration of the number of threads used in the OpenMP environment based on the current state of the runtime at the time of the call. For this purpose, we use a mix of profiling and machine learning techniques to determine the number of threads. The decision to set the number of threads is made at the time of the call. The proposed approach is designed to be cost effective in the scenario of a highly dynamic runtime primarily when running relatively long tasks consisting of a number of parallel constructs. I. I NTRODUCTION The OpenMP API was designed with a goal to provide a simple, fast, compact and portable way of parallelizing programs [1]. Computer architectures have diversified and evolved a great deal since the first release requiring OpenMP to evolve as well. Today, considering the prevalence of SMPs and SMT on the market, the massive virtualization and ccNUMA architectures, the default schemes that have sufficed in the past are no longer producing the desired results with the same consistency as before. Multi-cache architectures, hyper threading and multi-core techniques helped prevent the potential stagnation in performance due to the limits imposed by physics with a cost. The intensive exploitation of virtualization techniques especially lately (of memory, computing power, application, operating systems) only enhance the complexity introduced. In order to get a good performance, programmers as well as scientists have had to start studying the hardware characteristics, cache associativity, swapping, virtualization and many other low-level hardware and software characteristics. The main problem is the diversity of resource sharing mechanisms in different architectures, most notably the SMT [2]. OpenMP has a default scheme that is implementation specific, meaning that OpenMP implementations use different default policies to assign and interpret the internal control variables (ICVs). Additionally, in virtual machines, the number of threads is affected by the way different hypervisor present and assign processors and virtual cores to the virtual machine. The default scheme is static and tends to use the number of physical cores as the default number of threads. This does not exploit the hyper threading automatically and whether it is a good or bad decision depends on the interconnection, location, resource sharing policies and mechanisms of the processors involved as well as how the program utilizes them. 208 Having a tool that can automatically learn the formula for the optimal number of threads without previously gathering explicit user feedback on the program to be ran can significantly automate the process of producing better performing algorithms. Initially, a suite of benchmarks is ran concurrently using pThreads. Using profiling to capture the state of the runtime while running the suite, an output dataset is created. This dataset is used as an input for the creation of random trees [3] (the forest tree algorithm [4]). This is a machine learning algorithm that uses a supervised approach to train a number of decision trees that all vote on the number of threads i.e classifying the current runtime state, designed in such a way as to avoid overfitting (generate bad predictions because the tree is too specific to a subset of all the possible inputs or runtime states). In this paper we aim at testing the validity of the following hypotheses: H1 There exists a subset or multiple subsets of dataset attributes for which the predictions generated by the random forest/tool does not degrade performance in any of the test cases relative to the performance achieved by assigning the number of cores to the num threads OpenMP variable. H2 There exists a subset or multiple subsets of dataset attributes for which the tool boosts performance in comparison with the default OpenMP scheme either by increasing or decreasing the number of threads. Further details are disclosed in the following sections. Implementation details of the proposed approach are given in Section II. The experiments are described in Section III and results in Section IV. Relevant discussion is given in Section V, related work in Section VI and conclusions in Section VII. II. I MPLEMENTATION D ETAILS This section describes the approach used to optimize the number of threads for an OpenMP program. Retrieval of the runtime parameters of an OpenMP program is done using the SIGAR API [5]. The SIGAR API is an open source cross-platform library that offers helps in parsing and locating major system parameters including CPU usage, ram usage, virtual memory IO traffic (total and by process) as well as more detailed information about the processes, the hardware MIPRO 2016/DC VIS TABLE I L IST OF ATTRIBUTES USED num threads cpu usage pid cpu usage ram usage pid ram usage pid page faults vm usage pid vm usage pid vm read usage pid vm write usage processes procs running procs blocked number of threads for one run cpu usage for all processes cpu usage for the current process ram usage for all processes ram usage for the current process page faults for the current process virtual memory usage for all processes virtual memory usage for the current process virtual memory reads percentage for the current process virtual memory writes percentage for the current process # of processes and threads created # of running processes and threads # of blocked processes and threads features and the operating system running on the machine. An indicator for its quality is that it is a framework suitable for use in cloud environments for measuring hypervisor performance [6]. It offers an extensive set of utilities to monitor the state of a highly dynamic runtime which makes it a suitable tool for the kind of measurements needed for this tool. A sample of all potential dataset attributes is given in Table I. PThreads is a an implementation of thread-level parallelism [7]. It is used to concurrently run a suite of OpenMP benchmarks. The benchmarks are ran for a range of threads [1, max num threads] constituting one run. This is done 50 times per benchmark by default. The optimal number of threads from the specified range is found using the best performance time yielded during each consecutive run. The corresponding runtime state is then written to disc. The OpenMP algorithm which utilizes the tool has to call the omp set num threads function before each parallel block to invoke the wrapper function. The number of threads is predicted using the random forest and used as a parameter in the new handle which is returned. III. D ESCRIPTION OF E XPERIMENTS The goal of running the experiments is to simulate as many states of the runtime as possible so that a representative decision tree can be constructed out of the provided dataset. This is closely related to the parameters that are used to represent the state and is essential to be carefully configured so that performance boost can be obtained. Random delays are enforced in between runs in order to accomplish this better. The dataset is boosted by generating additional dataset records using the existing dataset records with repetition. Basically the experiment is repeated a number of times, and this helps to break the independency, or decrease its negative impact on the decision trees that are to be generated out of the dataset by adding a variance to the output that helps remove some of the initial bias. This works for small biases only. Additionally, boosting enables us to get p-values and provides a good basis for running statistical tests and deriving conclusions about our data. Therefore, the parameters have to be very carefully selected. Additionally, random forest does MIPRO 2016/DC VIS TABLE II E XPERIMENTAL ENVIRONMENT FOR PROFILING Benchmark programs Hardware OS Dijkstra, Multitask, Poisson [8] HP Probook 650 i5-4210M RAM 4GB Debian, GNU Linux 6.0 TABLE III L IST OF ATTRIBUTES INCLUDED IN THE DATASETS Dataset 1 Dataset 2 Dataset 3 Dataset 4 all attributes listed in Table I num threads, cpu usage ,pid cpu usage, lst cpu usage, ram usage, pid page faults, procs blocked, user, sys, idle, irq num threads,pid cpu usage,lst cpu usage, nice, idle, wait, irq, soft irq num threads, pid ram usage, ram usage, pid cpu usage, cpu usage, procs blocked nice, idle, user, sys, wait, irq, soft irq, lst cpu usage not require any normalization of the dataset, since it considers each attribute independently [4]. Each forest is created the first time when the call to set num threads is made. After the initial configuration, the forest is used to predict the number of threads for the parallel region in the OpenMP algorithm. The overall results are 50-150 obtained values of elapsed times in both cases by using the random forest for prediction, and by utilizing OpenMP default schemes. The experimental environment is presented in Table II. The experiments are based on using 4 subsets of the parameters listed in Table I. The corresponding parameters are listed in Table III. The overall performance is analyzed in two environments. No stress environment is the one where the experiments are executed with no additional workload. A stress environment is simulated using the stress program package for POSIX systems that imposes stress on the system (CPU, memory, I/O, disk stress) by starting various number of threads concurrently to the experiments. Three experiments are conducted as OpenMP default schemes and four more experiments using the tool with different dataset. To evaluate the speedup of using the tool and the default configuration we compare the corresponding elapsed times. Algorithm I conducts exhaustive search to find a combination of input gates to produce logical circuit with output 1 [8]. It has a repetitive structure without complex data dependencies and exploits a can high degree of parallelism. Algorithm II is an implementation of the Fast Fourier Transformation (FFT) algorithm with a lot of data-flow dependencies which do not allow a high level of parallelism. IV. R ESULTS A. Algorithm I in a No Stress environment Fig. 1 presents the results (elapsed times) obtained with the default OpenMP scheme setting the number of threads variable to 1, 2 and 4. The X-axis represents the number of conducted 209 Fig. 1. Performance of executing the Algorithm I in a no stress environment Fig. 3. Performance of executing the Algorithm II in a no stress environment with default OpenMP schemes Fig. 4. Performance of executing the Algorithm II in a no stress environment using the tool with Datasets 1 - 4 Fig. 2. Histograms for the 4 datasets for executing Algorithm I in a no stress environment. test runs. Four separate measurements are conducted to evaluate the tool performance results. Each measurement uses a specific dataset defined in Table III to construct the forest. It is used to predict the number of threads, labeled in the graph with Dataset 1-4. A more detailed information about the output number of predicted threads with the tool is presented in Fig. 2. A total of 150 test runs of Algorithm I is utilizing the same 4 forests used for the performance measurements in Fig. 1 labeled with Dataset 1-4 in the decision making. The default OpenMP scheme yields best results with 4 threads (value of the OpenMP num threads variable). Using this reference value for the optimal choice of the number of threads, Dataset 2 and Dataset 4 result in a forest that most accurately predicts the optimal number of threads. This is presented in Fig. 2 pointing out the 100% accuracy for Dataset 2 and 96% accuracy for Dataset 4. For machines with the same hardware configuration that by default use the number of physical cores (2 in this case it is 210 2 Threads), the best speedup is Sp(2T, D2) achieved using Dataset 2, with an average value of 1.876241, which confirms the Hypothesis H2. Additionally, the predictions do not include a number of threads less than the physical number of cores nor for the best performing Dataset 2 nor for any of the other datasets, confirming the Hypothesis H1. B. Algorithm II in a No Stress environment The results obtained executing the Algorithm II using the OpenMP default scheme are presented in Fig. 3 and for executing the tool with Datasets 1 - 4 in Fig. 4 The histogram of predicted threads for executing the Algorithm II in a no stress environment is presented in Fig. 5. The tool using Dataset 2 confirms the Hypothesis H2, boosting performance in comparison with the most common OpenMP default number of threads which corresponds to the number of cores, i.e 2, in the majority of the runs. It can be concluded that Dataset 4 results in slightly better performance according to the frequency of the Dataset 4 points that lie below Dataset2, which agrees with the minimum average out of these two datasets. Dataset 3 and Dataset 4 out of the four datasets give the most accurate results according to the average elapsed MIPRO 2016/DC VIS Fig. 7. Performance of executing the Algorithm I in a stress environment with Datasets 1 -4 Fig. 5. Histograms of predicted threads for executing Algorithm II in a no stress environment. Fig. 6. Performance of executing the Algorithm I in a stress environment with OpenMP default schemes time. However, the average elapsed time for this algorithm is generally not a good indicator of the whether the predictions optimize the performance since the intervals [mean-sd, mean +sd] of the measurements listed in Table IV all overlap with each other except for 1 thread. Therefore the graphs along with the significant attributes should be analyzed in order to determine which dataset results in the best optimization of the algorithm. Additionally, no dataset prediction degrades the performance, and results with a number of threads less than the number of physical cores Fig. 5. C. Algorithm I in a Stress environment The performances of executing the Algorithm I in stress environment are presented in Fig. 6 for default OpenMP schemes and in Fig. 7 for Datasets 1-4. Fig. 8 presents the histogram of predicted threads for executing the Algorithm I in stress environment and contains MIPRO 2016/DC VIS Fig. 8. Histogram of predicted threads for executing the Algorithm I in a stress environment information about the frequency associated with the predictions. Algorithm I exploits the parallelism at a high degree and is accentuated enough so that in a stress environment using OpenMP default scheme, it is utilizing more cores. Eventually, this yields a better performance, except in very rare cases with some overlap. This overlap is located in the interval [x − sd, x + sd] for a certain value of x as elapsed time. It is a case when the performance of executing the algorithm with a given dataset overlaps with the same interval for execution of an OpenMP scheme. The execution of the Algorithm I with maximum number of threads in the OpenMP default scheme yields to the best performance, as presented in Table IV). The maximum number of threads is consequently used as a reference value when comparing the tool results with the OpenMP default scheme results. 211 Fig. 9. Performance of executing the Algorithm II in a stress environment with default OpenMP schemes. Fig. 11. Histogram of predicted threads for executing the Algorithm I in a stress environment TABLE IV AVERAGE ELAPSED TIMES FOR THE EXPERIMENTS Fig. 10. Performance of executing the Algorithm II in a stress environment with Datasets 1-4. D. Algorithm II in a Stress environment Fig. 9 presents the performance of executing Algorithm II in a stress environment with OpenMP schemes and Fig. 10 with Datasets 1-4. The best performance in the OpenMP schemes results in the configuration with 4 threads judging by the average. The majority of input states benefit more from 2 threads than 4 threads when comparing the overlaps shown in the performance graphs. This happens due to the variation. It is not a negligible percentage, so it should be considered when comparing the tool results with the OpenMP default scheme results. The reference optimal number of threads is therefore not uniquely determined, but is closely related to the runtime. The results presented in Fig. 9 and Fig. 11 indicate that from the OpenMP default scheme, the optimal number of threads is leaning towards 4 threads. However the same as with Algorithm I in a stress environment, this preference will be reviewed more closely due to the higher variations of the results i.e the unpredictability caused by the the highly dynamic runtime state changes. V. D ISCUSSION Table IV presents the average values of elapsed times for executing the Algorithm I in a no stress environment (A1NS), 212 Test run 1 Thread 2 Threads 4 Threads Dataset 1 Dataset 2 Dataset 3 Dataset 4 A1NS 35.65783 16.79859 08.89994 11.95526 08.95332 12.16092 09.00934 A2NS 7.752695 5.192298 4.724949 4.831201 4.770104 4.640983 4.556192 A1S 94.67928 41.35630 25.33419 39.43302 25.10371 40.83514 31.29540 A2S 22.99271 13.52446 13.58029 13.70530 12.70467 13.42115 12.54097 TABLE V S PEEDUPS OBTAINED FOR THE EXPERIMENTS A1NS A2NS A1S A2S Test run 1 Thread 2 Thread 4 Thread 1 Thread 2 Threads 4 Threads 1 Thread 2 Threads 4 Threads 1 Thread 2 Threads 4 Threads Dataset1 2.982606 1.405121 0.744437 1.604713 1.074743 0.978007 2.401015 1.048773 0.541023 1.677651 0.986805 0.990874 Dataset2 3.982636 1.876241 0.994037 1.625267 1.088508 0.990534 3.771525 1.647418 0.849842 1.809784 1.064527 1.068921 Dataset3 2.932165 1.381358 0.731847 1.670486 1.118793 1.018092 2.318574 1.012763 0.522447 1.713170 1.007698 1.011857 Dataset4 3.957874 1.864575 0.987857 1.701573 1.139614 1.037039 3.025342 1.321481 0.681704 1.833408 1.078422 1.082874 Algorithm II in a no stress environment (A2NS), Algorithm I in a stress environment (A1S) and Algorithm II in a stress environment (A2S). The obtained values of speedups for the experiments are displayed in Table V. One can conclude that no dataset produces a speedup factor of less than 1 with the number of physical cores equal to the number of threads in (V). The exception is the Dataset 1 in stress environment with a value of 0.986805. Considering the standard deviation of the results, this value can be considered acceptable due to the stress environment. This evidence sup- MIPRO 2016/DC VIS ports hypothesis H1 for all the datasets. In summary, the performance evaluation yields the following conclusions. The speedups measured in the stress and no stress environment for both Algorithms I and II suggest that the predictions result in an optimized performance using either Dataset 2 or Dataset 4 for both Algorithms I and II. The indicators are the mean elapsed times combined with the conclusions made by analyzing the performance graphs that compare the results of executing the Datasets with the OpenMP default scheme results. The maximum number of threads results in best performance for both the no stress and stress environments. Since Algorithm I is scaled vertically more successfully than Algorithm II stress testing does not result in a different optimal number of threads. The tool predictions that yield the best performance using the maximum number of threads are Dataset 2 and Dataset 4 following closely behind with slightly lower speedups. The optimization/performance boost is done by taking advantage of hyperthreading. The successful optimization using Dataset 2 and Dataset 4 confirms Hypothesis H2 for Algorithm I. The optimal number of threads for Algorithm II in both the no stress and stress environment is not unique as in the previous case with Algorithm I. Therefore it requires more thorough graph analysis of the resulting performances. This analysis shows evidence that the tool succeeds in finding an optimal number of threads that varies from 2 to 4. Despite the high index of variance inevitably caused by the intensive stress testing, close examination of the graph/plots and the statistics of the datasets and the predictions, yields the same conclusion that Dataset 2 and Dataset 4 result again in a better performance than for the other datasets. This evidence confirms hypothesis H2 for Algorithm II. VI. R ELATED WORK Among the research work done in area of adaptive or dynamic runtime OpenMP configuration, one of the most recent works, is based on creating a Multilayer Perception neural network with one hidden layer with back propagation to determine the number of threads for a parallel region based on the external workload [9]. Even though different machine algorithm, benchmarks and profiling tools are used, there are similarities between the goals set and methodologies used between this paper and [9]. The benefits of the approaches to optimization presented in both papers are mainly demonstrated with increase in the workload. A machine learning technique is used to predict the number of threads and is tested in an unknown setting. Other similar work is done building up on [9] using other approaches other than neural networks in dynamic workload settings, such as reinforced learning and Markov decision processes for unsupervised learning [10]. Random forests are mentioned as a suggestion for another machine learning algorithm that can solve the problem. Substantial amount of work is done in creating loop scheduling policies using runtime information. Other research work MIPRO 2016/DC VIS demonstrate promising results by using an adaptive loop scheduling strategy targeting SMT machine [11]. VII. C ONCLUSION The conducted experiments and performance analysis provide evidence that supports the correctness of hypotheses H1 and H2. The main achievement is that the analysis tool provides a way to map the dynamic runtime state to the optimal number of threads in a real-time manner being able to respond to the changes in the runtime. The automation of this task becomes greater and greater advantage with increasing the number of processors and hyperthreading scale on a machine. Therefore the significance of this paper is to explore a way of automating optimization in OpenMP and set up a groundwork for developing a truly generic cross-platform optimization tool that requires minimum configuration to produce good predictions on the optimal number of threads. The results so far are encouraging, and the tool is able to correct the default scheme and find the optimal number of threads in cases of a highly dynamic runtime when the variation of the parameters is significant enough to provide indication of it. R EFERENCES [1] B. Barney. (2015, Aug) OpenMP. [Online]. Available: https://computing. llnl.gov/tutorials/openMP/ [2] M. Curtis-Maury, X. Ding, C. D. Antonopoulos, and D. S. Nikolopoulos, “An evaluation of openmp on current and emerging multithreaded/multicore processors,” in OpenMP Shared Memory Parallel Programming. Springer, 2008, pp. 133–144. [3] OpenCV dev team. (2014, Nov) OpenCV documentation, Random Trees. [Online]. Available: http://docs.opencv.org/3.0-beta/modules/ml/ doc/random trees.html [4] L. Breiman and A. Cutler. (2004) Random Forests. [Online]. Available: http://www.stat.berkeley.edu/∼breiman/RandomForests/cc home.htm [5] R. Morgan and D. MacEachem. (2010, Dec) Sigar - system information gatherer and reporter. [Online]. Available: https://support.hyperic.com/ display/SIGAR/Home [6] P. V. V. Reddy and L. Rajamani, “Evaluation of different hypervisors performance in the private cloud with sigar framework,” International Journal of Advanced Computer Science and Applications, vol. 5, no. 2, 2014. [7] B. Barney. (2015, Aug) POSIX Threads Programming. [Online]. Available: https://computing.llnl.gov/tutorials/pthreads/ [8] John Burkardt. (2011, May) C++ Examples of Parallel Programming with OpenMP. [Online]. Available: http://www.stat.berkeley.edu/ ∼breiman/RandomForests/cc home.htm [9] M. K. Emani, Z. Wang, and M. F. O’Boyle, “Smart, adaptive mapping of parallelism in the presence of external workload,” in Code Generation and Optimization (CGO), 2013 IEEE/ACM International Symposium on. IEEE, 2013, pp. 1–10. [10] M. K. Emani, “Adaptive parallelism mapping in dynamic environments using machine learning,” Ph.D. dissertation, The University of Edinburgh, 2015. [11] Y. Zhang, M. Burcea, V. Cheng, R. Ho, and M. Voss, “An adaptive OpenMP loop scheduler for hyperthreaded SMPs,” in ISCA PDCS, 2004, pp. 256–263. 213 An effective Task Scheduling Strategy in multiple Data centers in Cloud Scientific Workflow Esma Insaf Djebbar Ghalem Belalem Department of Computer Science University of Oran 1, Ahmed Ben Bella Oran, Algeria esma.djebbar@gmail.com Abstract— Cloud computing is currently the most hyped and popular paradigm in the domain of distributed computing. In this model, data and computation are operated somewhere in a cloud which is some collection of data centers owned and maintained by a third party. Scheduling is the one of the most prominent activities that executes in the cloud computing environment. The goal of cloud task scheduling is to achieve high system throughput and to allocate various computing resources to applications. The Complexity of the scheduling problem increases with the size of the task and becomes highly difficult to solve effectively. In this research, we propose a task scheduling strategy for Cloud scientific workflows based on gang scheduling in multiple Data centers. The experimentation shows the performance of the proposed strategy on response time and average cost of cloudlets. Keywords— Cloud scientific workflow, task scheduling, gang scheduling, multiple Data centers. I. INTRODUCTION Cloud computing is a distributed computing paradigm that mixes aspects of Grid computing, Internet computing, Autonomic computing, Utility computing, and Green computing. Cloud computing is derived from the servicecentric perspective that is quickly and widely spreading in the Internet Technology world. From this perspective, all capabilities and resources of a Cloud (usually geographically distributed) are provided to the users as a service, to be accessed through the Internet without any specific knowledge of, expertise with, or control over the underlying technology infrastructure that supports them. Cloud computing offers a user-centric interface, that acts as a unique point of access for users' needs and requirements. Moreover, it provides ondemand service provision, Quality of Service (QoS) guaranteed offer, and autonomous system for managing hardware, software, and data transparency to the users. Cloud computing has recently received considerable attention, as a promising approach for delivering Information and Communication Technologies (ICT) services as a utility. In the mechanism of providing these services it is necessary to improve the utilization of data center resources which are operating in most dynamic workload environments. Data centers are the essential parts of cloud computing. In a single data center generally hundreds and thousands of virtual servers run at any instance of time, hosting many tasks and at 214 Department of Computer Science University of Oran 1, Ahmed Ben Bella Oran, Algeria ghalem1dz@gmail.com the same time the cloud system keeps receiving the batches of task requests. During this context, one has to notice a few target servers out of many powered on servers, which can fulfil a batch of incoming tasks. So Task scheduling is a valuable issue which greatly influences the performance of cloud service provider. Traditional approaches that are used in optimization are deterministic, fast, and give perfect answers but often tends to get stuck on local optima. The complexity of the task scheduling problem belongs extremely large search space with correspondingly large number of potential solutions and takes much longer time to find the optimal answer. There is no ready made and well outlined methodology to solve the problems under such circumstances. However in cloud, it is tolerable to find near best solution, preferably in a short period of time. This work is a continuation of our work presented in [7]. The rest of the article is organized as follows. Section 2 presents the related work. Section 3 introduces the basic strategy of the proposed approach, gives an example and analyzes the research problem. Section 4 demonstrates the simulation results and the evaluation. Finally, Section 5 addresses our conclusions and future works. II. RELATED WORKS There are so many algorithms for scheduling in Cloud computing. The main advantage of scheduling algorithm is to obtain a high performance. The main examples of scheduling algorithms are First Come First Served (FCFS), Round-Robin (RR), Min-Min algorithm, and Max-Min algorithm. A. FCFS Algorithm First come First serve Algorithm basis means that task that come first will be execute first. B. Round-Robin Algorithm (RRA) In this Scheduling algorithm, time is to be given to resources in a time slice manner. C. Min-Min Algorithm Min-Min Algorithm selects the smaller tasks to be executed first. MIPRO 2016/DC VIS D. Max-Min Algorithm Max-Min Algorithm selects the smaller tasks to be executed first. Scheduling in cloud computing can be categorized into three stages:    Discovering a resource and filtering them. Selecting a target resource (Decision stage). Submission of a particular task to a target resource. Lin and Lu [1] developed an algorithm for scheduling of workflows in service-oriented environments. Unlike algorithms for Grid systems, it is able to utilize dynamic provisioning of resources. However, it misses the capability of considering the cost for utilization of resources required for its utilization in Cloud environments. In [2] is presented a scheduling algorithm based on cost of workflows for real-time applications. The purpose of the algorithm is to develop a scheduler that minimizes the cost and still meets the time constraints imposed by the user. The workflow is divided into subsets of tasks for establishing a single flow. Tasks that do not form a single flow are separated and each of them runs as an independent subset. In [3], is presented a strategy of a dynamic workflow scheduling that treats the relationship user/resource. In this approach the resources are not seen individually, but grouped. The scheduler, in this approach, selects the sites and this selection is made by an opportunistic strategy. It aims to spread the tasks of the workflow through Grid sites based on their performance in previous submissions. III. PROPOSED STRATEGY In the Cloud computing system, several tasks are running simultaneously, the tasks will have more access to data. To run a task, the data must be aggregated and this requires more data movement. Therefore, if several tasks are used the same data, they must be placed together to minimize the frequency of data movement. The proposed approach includes two important stages. Each of which contains a set of operations to be performed. Figure \ref{fig1} shows a global view of the approach. Fig1. A global view of the proposed approach A. Stage of construction During the construction phase, we use a matrix model to represent the existing task. We form clusters in the task set by the transformation of matrix, and then we distribute the data set to different data centers as the original partitions to be used for the next step. First, we compute the task dependencies of all tasks and accumulate a matrix TM whose elements TMij= dependencyij. The value is the dependency between the tasks Ti and Tj. It can be calculated by counting data in common between the sets of task that are denoted as Ti and Tj. Specifically, for the diagonal elements of the TM, each value means the number of data that will be use by this task. TM is a symmetric matrix of dimension n*n where n is the total number of existing Datasets. The Bond Energy Algorithm (BEA) is applied to the matrix TM in order to group similar values together. Two measures, BEC and BEL are defined for this algorithm. The permutation is done so that these measures (see Formulas1 and 2) are maximized: (1) (2) After applying the algorithm to group the BEA similar values in the matrix. The TM matrix is partitioned according to the number of data centers in multiple clustered arrays named: CM1, CM2, CM3, ... B. Stage of scheduling The dependency matrix (i.eTM) is dynamically maintained at this phase. When a new tasks are generated or added to the system by user, we calculate their dependencies with all MIPRO 2016/DC VIS 215 existing tasks and we add them to the matrix TM. We take the dependency matrix (TM) as input and generate the clustering dependency matrix (CM). In the CM, the items with the same values are grouped together. Before worrying Datasets that will be generated, first we must run existing tasks. Since the movement of Datasets of a data center to another is more expensive than the scheduling of tasks to the Datacenter. A job scheduling algorithm is used (Algorithm of scheduling). Fig2. Average response time of cloudlets In this algorithm, the technique used is based on the placement of Datasets; the ready tasks are scheduled to the Datacenter that contains the majority of the Datasets required. A task is said to be ready if all required Datasets belong to the set of existing Datasets. Once the tasks are completed, new Datasets are generated. IV. EXPERIMENTATIONS AND RESULTS In this section, we describe the experiments that we conducted to evaluate the proposed approach. Experiments were conducted with the CloudSim toolkit [4, 5, 6]. The objective of CloudSim Framework is to provide a generalized and extensible simulation that allows the modeling, simulation and experimentation of new infrastructure cloud and associated application services. We used in our work a simulator CloudSim version 3.0.3. A. Average Response time The template is designed so that author affiliations are not repeated each time for multiple authors of the same affiliation. Please keep your affiliations as succinct as possible (for example, do not differentiate among departments of the same organization). This template was designed for two affiliations. In this first simulation, we calculated the average response time with TimeShared and TimeShared Clustering (proposed strategy). For a different number of Cloudlet (20, 40, 60, 80, 100) with a length corresponding to the Cloudlet. In Cloudsim, the TimeShared algorithm can handle multiple requests (Cloudlet) in the same time but cloudlets must share the computing power of the machine, the functioning of the TimeShared algorithm is equivalent to the Round Robin algorithm. The following figures (Figure 2) and (Figure 3) show the result of implementation of the average response time. 216 Fig3. Average response time of cloudlets Based on these results, we note that the average response in Time Shared increases each time that the number of cloudlets rise because many cloudlets are processed at once while running takes time to process all cloudlets therefore the average response time increases each time. In Time Shared Clustering, the average response time is low because the Cloudlet are divided in different Datacenter. B. Average Cost of cloudlets In this series of simulation, we calculated the average processing cost of Cloudlet with both algorithms (TimeShared and TimeShared Clustering). In (Figure 3 and Figure 4) are presented, on the bar charts, the main executions performed on that scenario. Fig4. Average cost of cloudlets MIPRO 2016/DC VIS is the best and the most popular simulator for Clouds. The objective of this simulation was to make experiments of some metric; the wait time, response time and the cost of processing with the two algorithms based on Shared Time (First Came First Served) and SpaceShared which is based on (Round Robin). Fig5. Average cost of cloudlets The objective of this series of simulation is to study the impact of our application on the cost of processing cloudlets. Based on these results, we note that the average cost in TimeShared algorithm is very high compared to TimeShared Clustering because the CPU utilization is less in TimeShared. V. CONCLUSION AND FUTURE WORKS Cloud computing implies that the data processing is not performed only on local computers, but on third-party data center. It refers equally to applications delivered as a service over the Internet as infrastructure as a service, i.e. the compute resources and / or storage. These technologies appear to be key solutions for companies and research teams with modest budgets. It allows to provide users with a set of application without having a lot of resources, the user connects to the cloud service provider site uses the applications proposed to it without realizing it accesses machines (virtual or not) different, it can also store personal data on remote servers, all of these services, which are available to users, are provided through the three cloud models, namely SaaS, PaaS and IaaS. This work deals with the problem of task scheduling and their influence on the performance and cost of execution on the type of cloud IAAS. The initial objectives were to propose a method to distribute costs and improve runtime execution and implement and study the impact of our application on the performance of Cloud Computing. We have established simulations under CloudSim simulator environment because it MIPRO 2016/DC VIS This work seeks primarily to improve cloud services including the execution time of applications and the financial cost. For this, we compared the two algorithms TimeShrad and TimeSharedClustering. The results of the experiments are carried out under the CloudSim simulator prove encouraging that meet our expectations. The results of experiments carried out under the CloudSim simulator prove encouraging and have met our expectations. References [1] [2] [3] [4] [5] [6] [7] C. Lin and S. Lu, SCPOR: An Elastic Workflow Scheduling Algorithm for Services Computing, in Proc. Int'l Conf. SOCA, pp. 1-8, 2011. J. Yu, R. Buyya, and C. K. Tham. Cost-based scheduling of scientific workow application on utility grids. In E-SCIENCE '05: Proceedings of the First International Conference on e-Science and Grid Computing, pages 140-147, Washington, DC, USA, 2005. L. A. V. C. Meyer, D. Scheftner, J.-S. Vckler, M. Mattoso, M. Wilde, and I. T. Foster. An opportunistic algorithm for scheduling workflows on grids. In VECPAR, volume 4395 of Lecture Notes in Computer Science, pages 1-12, 2006. R.N. Calheiros, R. Ranjan, A. Beloglazov, C.A.F.D. Rose, and R. Buyya, CloudSim: A Toolkit for Modeling and Simulation of Cloud Computing Environments and Evaluation of Resource Provisioning Algorithms, Softw., Pract. Exper., vol. 41, no. 1, pp. 23-50, Jan. 2011. Rajkumar Buyya, Rajiv Ranjan and Rodrigo N. Calheiros, Modeling and Simulation of Scalable Cloud Computing Environments and the CloudSim Toolkit: Challenges and Opportunities. Rodrigo N. Calheiros, Rajiv Ranjan, Anton Belogla zov, César A. F. De Rose, and Rajkumar Buyya, CloudSim A Toolkit for Modeling and Simulation of Cloud Computing Environments and Evaluation of Resource Provisioning Algorithms. Esma Insaf Djebbar, Ghalem Belalem. Optimization of Tasks Scheduling by an Efficacy Data Placement and Replication in Cloud Computing. in International Conference on Algorithms and Architectures for Parallel Processing (ICA3PP), pages 22-29, December 2013. 217 Visualization in the ECG QRS Detection Algorithms A. Ristovski* , A. Guseva* , M. Gusev** , S. Ristov** * ** Innovation Dooel, Skopje, Macedonia Ss. Cyril and Methodius University, Skopje, Macedonia info@innovation.com.mk Abstract—Digital ECG data analysis is a trending concept in the field where applied computer science and medicine coincide. Therefore, in order to meet the requirements that arise, our R&D team has created an environment where developers can test different approaches in data processing. To assist the objective, the platform offers a number of features with the following main target goals: 1) to increase the effectiveness in conducting a proper medical diagnose, 2) to incorporate a unified format of storing the results of the diagnosis conducted and 3) to test various ECG QRS detection algorithms. Index Terms—ECG Analysis; Modular Software Architecture; AI Agent. I. I NTRODUCTION The striding process of data digitalization has not left the ECG data immune to it. For an instance, modern professional ECG instruments often feature a digital output in addition to the hard copied records. While hard copied records have the role of feasible hand-outs and provide medical personnel an immediate overview of a patient’s condition, the digitalized records have proven to be convenient for storage. Also, there are devices know as Holter monitors, that take continuous readings for a period of one or several days. Holter monitors assist in the diagnosis of conditions that require an extended keep up with the heart’s physiology. Their output is inevitably digitalized since it is impossible to keep the lengthly recording on paper. Not long ago, the medical gimmick market became abundant with gadgets that administer the user with simplified ECG readings, again, taking advantage of the most convenient storage form - digitalized data. No matter the level of exhaustiveness and accuracy of the information contained in the digitalized data, many opportunities rise in the filed of digital data processing. What is of greatest significance, is the opportunity to give a diagnosis on the recordings without the assistance of medical personnel, i.e. to replace human resources with a virtual Artificial Intelligence (AI) agent [1]. The extent of details provided in a conducted diagnosis vary according to the needs. In some cases, only statistical data targeting a specific medical condition or property is collected, and in others, a thorough analysis of the ECG recording is handled - to deliver an accurate cardiac profiling. Regardless the scope of the diagnosis, visualizing the outcome of the signal processing while testing different algorithms is of substantial benefit. Applying a filter to the signal 218 before it is processed can increase the effectiveness of the algorithm used for the processing. Therefore, for the purpose of having an insight how the filter has effected the initial signal, a graphic representation of the processed signal in contrast to a representation of the original one is a must. That way, by modifying the filters’ parameters and immediate observation of the results, the process of improving the effectiveness gains a significant advance. An ECG signal consists of a sequence of form-specific element set of typical waves that appear in a unambiguous pattern, at least to say unambiguous for most of time. These typical waves are referred to as components of the signal or characteristic points. Although the signal’s components can be easily distinguished by the human eye, it is quite the challenge to teach an AI agent how to do so. By visually marking which signal components have been detected and their exact point of detection, one can determine to what extent the detection algorithm has been successful. That way, the left-outs can be consequently analyzed and the algorithm can be modified accordingly, hence, doing corrections in a favorable direction. Some algorithms may even use pattern recognition in the detection of the signals’ components. By visualizing how does a component fit in the pattern it is supposed to fit in, the pattern can be improved in a way so that it better detects the components that have been left out during the detection. In addition, one of the more advanced features when using a pattern recognition is the ability to asses which part of the components deviates from the general form and in what way. Therefore, using visualization in the development of algorithms which do the ECG component detection is of great interest. The goal is to design an AI agent that will mimic the way a medic would interpret a reading. Although the AI agent itself does not need visualization of any kind to do its job, it is practically inevitable for the development process to feature the use of visualization modules, i.e. provide developers a visual insight in how the approach they are using is affecting the correlation between the input data - the ECG signal and the output data - a legitimate diagnosis. Whereas the paper is focused on the role of visualization in the AI agent design process, it also encloses an architectural design of an ECG diagnosis application that is not only aiming to deliver accurate results, but also, to incorporate functionalities for testing new approaches and improve overall effectiveness. MIPRO 2016/DC VIS The paper is organized as follows. The architectural design of the solution is given in Section II. Section III describes the architectural modules in detail and explains how the visualization has been implemented how it assists the process of devising the detection algorithms. An overview of existing related work is handed in Section IV. Section V discusses the conclusions and directions for future work. II. A RCHITECTURAL D ESIGN The features of a tool designed for delivering an ECG diagnosis are incorporated into the modules of a system with modular software architecture. Such design allows the advantages of distributed development, code re-usability and modular transparency, hence, the integration of new functionalities and the upgrade of existing ones can be done easily. The process of AI agent assisted diagnosing is very close to how a medic would act upon doing so, with respect to the steps needed to utilize the input and output formats that make a notable exception. The whole routine consist of several stages. These include: • Unpacking and interpreting the ECG recording, whatever the file format; • Pinpointing the signal components of the signal: – P waves; – QRS complexes; – T waves; • Analise the constellation and form of the signal components; • Set a medical diagnosis; • Save the annotations; In addition, the system is capable to evaluate the effectiveness of the applied algorithm on an extensive database of patients. It responds with several key effectiveness indicators, such as detection rate (percentage of detected features), hit rate (percentage of correctly detected features), miss rate (percentage of incorrectly detected features), and also the extra rate (percentage of extra determined features, which are not recognized by a human and have to be eliminated). This module is efficient in recognizing those parts of the ECG which caused problems (extra features or misses) and a user can automatically analyze them and improve the algorithm by customizing and adapting its parameters. The product reviewed in this paper has been designed in accordance with the stages of ECG analysis numbered above and is using a separate software module for each stage. Guidelines for pragmatic processing and several assisting visualization modules are also designed in accordance with the concept of modular architecture. They have been offered at disposal in order to make the whole process simpler and more transparent for developers. Although the architecture makes a notably pragmatic solution for the problem, what is more important is that it excels at increasing the effectiveness of the diagnostic capabilities. The overall architecture of the system is presented in Fig. 1. The next section gives a description on how each module deals MIPRO 2016/DC VIS Fig. 1. An overview of the system’s architecture with the problems in respect, and, moreover, focuses on the interest of implementing the modules for visualization. III. F UNCTIONAL D ESCRIPTION One of the fundamental ideas for an application with such architecture is interoperability. That is achieved by allowing data input in different file formats, hence, making it possible for the application to process output data from a variety of ECG sensors. The application’s input module handles the different data representations containing the actual ECG signal, so that the data is given a unified composition - an inevitable prerequisite for the next stage of the analysis. There are several standard medical record data formats for storing and retrieving ECG files. Although most of them are XLM based, still there are examples of binary files, such as those used by PhysioNet open source library [2]. The most profound is maybe the USA FDA standard HL7 [3] that specifies annotated ECGs and provides means to systematically evaluate the ECG waveforms and measurement locations. Bond et al. [4] review nine different formats used to store the ECG, such as SCP-ECG, DICOM-ECG, and HL7 aECG. The input module also does the consolidation of additional record information, such as sensor ID, time stamp, location, type of ECG lead being analyzed [5] and alike. The ECG signal visualization module incorporated into the application’s GUI is an important module for visualization of the ECG signal. By the means of this feature, the developers have an immediate insight on how does the analog representation of the digital signal look like, as if looking at a hard copied ECG recording. Therefore, this visualization module is responsible for displaying the consolidated ECG signal. The OpenGL libraries [6] have been used for the purpose of visualizing the data. Since the development environment has incorporated the C# programming language, a wrapper for the OpenGL libraries is needed. The choice of OpenGL wrapper has come down to the OpenTK wrapper [7], superseding the Tao Framework [8]. The application interface displays the additional information of the ECG signal and highlights the analog representation of the ECG signal in a standard ECG graph paper surrounding [9]. A standard record duration of 10 and 30 seconds are the 219 Fig. 2. A snapshot of the signal visualization GUI component most common length of an ECG record, as the information contained within a 10 seconds reading is sufficient for a standard diagnosis. Moreover, records with these lengths are optimal in terms of appearance since longer records would make the visualized signal appear cluttered. However, records with longer duration are not nonexistent. Certain diagnosis types require readings that last up to half an hour. Visualization of such records has been also supported, by enabling transition between consecutive 10 second strips, thus, maintaining optimal appearance. The size of the visualization component in the Graphical User Interface (GUI) is scalable, so that it adapts to the dimensions it is supposed to have, hence, the ECG recording being displayed scales up to provide maximum clarity and visibility. The height of the graph paper squares scales up as well, to conform with the signal’s amplitude. Optionally, a zero amplitude line can be displayed as well. The DSP module is another auxiliary application module used for filtering the digital signal from noise. The effectiveness of the upcoming second phase of ECG analysis can be significantly improved by applying a filter to the signal. The filtering can realize various DSP methods, such as, interpolation, moving average, regressive filters, differential filtering, wavelets, discrete Fourier transformations, etc. The module allows use of a combination of any of these filters and customizing their parameters. The processed signal is at request displayed within the same visualization module, next to the original signal - in order to give the developer an insight of how applying the filter and changing the filters’ parameters is affecting the signal. The GUI component is shown on Fig. 2. The way the ECG signal visualization module and the DSP module improve the effectiveness is shown on Fig. 3. The second phase involves the feature extraction module used for detection and localization of the signal components. The easiest way of doing this is to conduct a QRS complex detection first, as this signal component has the most distinctive form and is the easiest to pick up out of the ECG recording. The other two components, the P waves and T waves, can be optionally detected as well - by conducting a look up for one of each in the interval between two neighbor QRS 220 Fig. 3. An iterative process on how the ECG signal visualization improves the overall effectiveness complexes. This stage can be done in a number of different ways: by analyzing the signal’s curve slopes, by using pattern recognition, by looking for a local maximum at predefined intervals, etc. It is up to the developer to make an assessment which approach is of greatest interest. All the detected signal components are then collected in a logical structure, referred to as ECG signal component annotations, where an annotation MIPRO 2016/DC VIS Fig. 5. A snapshot of the ECG component detection visualization Fig. 4. An iterative process on how the pattern visualization model improves the overall effectiveness consist of the following couple: component index (time stamp) and component amplitude. In case of using a pattern recognition, the feature extraction module displays the pattern and the signal component entering the pattern. Because there are multiple components detected, this module enables selection of the signal component displayed on the ECG pattern visualization module. In case of undetected component, a specific part of the signal can be put on display under the pattern so that the pattern can be modified in order to better detect the components. This process is explained on Fig. 4. In case of using the expert system to diagnose a streaming data from an ECG sensor as a web service, the feature extraction module offers an excellent opportunity to tune and customize various filter and feature extraction parameters of the algorithms to match the specifics of the person. The ECG signal visualization module can also help with testing the effectiveness of stage two by specifically marking the signal components, i.e. by indicating which components have been successfully detected and which have not. The application’s interface offers selection of the signal components that are being displayed. For each type of components, if specified for display, an indicator is drawn right above their positions. MIPRO 2016/DC VIS The feature is shown on Figure 5. After the signal’s components have been temporally pinpointed (localized), the third phase is realized by the parametrization module. It features two types of analysis: obtaining temporal parameters and obtaining vector polarization parameters. The temporal parameters’ analysis starts with the calculation of typical component intervals. These intervals optionally include: PR, QRS, QT, PP and RR interval duration. Out of that data, additional temporal components are derived, such as Beats Per Minute (BPM), Heart Rate Variation (HRV), Beat Fluctuation (BFx), SDNN and other related parameters [10]. Then, the form of the signal components is being investigated into for the vector polarization analysis. Although there is small number of options how the parametrization can be done, since it comes down to a standard medical approach, the way the component extraction is handled offers significant flexibility and is, in fact, very similar to how the second phase - the component detection, is accomplished, as the signal’s amplitude is taken into consideration. In case of having only one lead in the ECG recording, very little relevant information can be given for the vector polarization, because the vector polarization is a derived compound from several leads that make up the QRS phasor [5]. Therefore, when processing one lead record, the vector polarization comprises only parameters such as abnormal amplitude levels and component width. The fourth phase consists of the diagnosis module that can closely correlate to how a cardiologist would conduct the ECG analysis. The information collected at the third phase is being used - a combination of a number of parameters, would trigger a medical condition marker. All the triggered markers are then reasoned into a closing composite diagnosis. Eventually, all the information gathered at the previous phases: the ECG signal and the additional ECG record data (phase one), the ECG component annotations (phase two), the typical ECG parameters (phase three) and the ECG diagnosis (phase four) are consolidated within the output module into an output file. Once again, it is up to the developer to determine which information is of interest and needs to be included in 221 the output. The output data can then be stored into a data base, sent through a network channel or stored locally as an individual file. Additional functionality of the final phase is the effectiveness check. There are databases that in addition to the ECG recordings offer supplemental component annotations. The annotations generated by the developed system and the pregenerated annotations supplemented to the ECG recordings from the databases can be automatically compared, via another auxiliary module, inbuilt in the output module. An extensive report of such comparison optionally accompanies the output file, and it includes analysis of the effectiveness indicators, such as the detection rate, hit rate, miss rate and extra rate. The application has a fully functional Graphical User Interface (GUI) that allows the developer to have an actual overview of the diagnosis given, as well as interfaces for handling the application’s input and output, a debug window and a status pane, as shown on Figure 6. There are additional interfaces for customizing the parameters used during the component detection process and for customizing the parameters used for the signal filtering. such as Java, C++, C#, and MATLAB. Oefinger et al. proposed a web service Java and CGI-based, which can plot ECG signals in the ECG database called Physionet [19]. Kartnik designed a MATLAB based ECG simulator [16], which can generate normal lead II ECG wave forms. Another example of a web-based ECG simulator is the ”The Six Second ECG Simulator” [20]. The simulator ”simECG” was developed using C++ language [21]. Additionally, an ECG simulator of particular interest is WebECG [22] where various ECG signals can be generated and plotted in 3-dimensional view with zooming and moving. Therefore, most of the related available products are in fact ECG simulators, meant for medical training of medical personnel or medical software developers. However, the system proposed in this paper would assist the medical software developers in a unique way. That is, not for training individuals on how to read and interpret ECG recordings, but to enable developers to design approaches for digital ECG processing and analysis. On the other hand, the proposed solution has functional similarities to other applications meant for diagnosis, such as the products by PhysioNet. IV. R ELATED WORK V. C ONCLUSION AND F UTURE W ORK There are several tools that help ECG diagnosis, and most of them are open source software packages or offered as web application. ECG has been attracting the researchers a lot and even the mathematical toolkits Mathematica [11] and MatLab [12] offer extensions for ECG interpretation. PhysioNet [13] have developed a large collection of software for viewing, analyzing, and creating recordings of physiologic signals, and their product Wave is a visual tool that enables an extensible interactive graphical environment for manipulating sets of digitized signals with optional annotations. An ECG simulator is a device, which can simulate ECG signals recorded previously in the real time mode [14]. The main goal of ECG simulators is to convert the digital ECG signals to analogue counterparts, which have been created in a computer graphic environment [15]. The ECG Simulator enables analysis and study of normal and abnormal ECG waveforms without actually using the ECG machine. One can simulate any given ECG waveform using the ECG Simulator [16]. This simulator can be used both for clinical training of doctors as well as for design, development and testing of automatic ECG machine without having real subject for this purpose. Most of the simulators on the market are offering services for training of readings of ECG and similar practical clinical skills. Nilsson et al. [17] propose a web-based ECGinterpretation program for undergraduate medical students. Lu et al. [18] designed a tool for modeling and simulation of electrophysiology. Kaur describes a simulator [14] that produces realistic ECG with the possibility of including other biological signals, e.g. blood pressure. In the literature, software-based ECG simulators have been developed by the help of different programming languages This developers’ environment has been built so that it meets the needs of an ongoing project. The project involves analysis of ECG data recorded by a set of wearable ECG sensors. An AI agent serves the sensors and is capable of detecting any abnormalities in the subject’s cardiac condition and also, capable of preventing critical scenarios on time. So far, the application has substantially contributed to improving the effectiveness of the component detection in a number of ways: by improving the pattern recognition approach and by testing different component detection methods, by contributing in the analysis of how different filters affect the original signal, by following a strict component based architecture which increases the code re-usability in case of transitioning to other platforms. One of the features that this tool offers is the transitioning from a desktop application to a web service. This is of high interest, because that way out R&D team could continue working on a fully web based platform, hence, make the corrections to the web server that is in charge of the processing of incoming ECG records without much effort. Howbeit, a greater variety of filters can be implemented, so that the users can choose one or combination of several, all in order to increase the accuracy of the diagnostics. At the same time, most of the added features have an effect that can be seen ”on the go” - all thanks to the visualizer module. To conclude, although the software was designed for a specific cause, it can easily be re-purposed or upgraded with functionalities, all due to its architectural modularity. It is a product that is not of exclusive interest to the team that build it, but assists the work of others that deal with similar challenges. The main benefit of this tool is the possibility to fine tune the parameters for DSP filtering and feature extraction in the ECG detection and, therefore, customize it towards a specific 222 MIPRO 2016/DC VIS Fig. 6. A snapshot of the visual tool for ECG QRS detection patient, especially in a case of using a streaming ECG sensor that continuously sends data to a diagnosis system. A machine learning approach can be used for the pattern parametrization in order to boost the effectiveness of the expert system. As future work, we plan to develop more filters and feature extraction algorithms. Additionally, the system is planed to support simultaneous processing of multiple recordings, hence, it will allow handling of entire databases and record sets. The effectiveness check that covers the annotations generated from a single file, will be extended to cover the annotations of multiple files and subject them to extensive statistical analysis. We believe that such expert system would be of particular interest for students and medical researchers. R EFERENCES [1] F. Gritzali, “Towards a generalized scheme for qrs detection in ecg waveforms,” Signal processing, vol. 15, no. 2, pp. 183–192, 1988. [2] A. L. Goldberger, L. A. Amaral, L. Glass, J. M. Hausdorff, P. C. Ivanov, R. G. Mark, J. E. Mietus, G. B. Moody, C.-K. Peng, and H. E. Stanley, “Physiobank, physiotoolkit, and physionet components of a new research resource for complex physiologic signals,” Circulation, vol. 101, no. 23, pp. e215–e220, 2000. [3] Health Level Seven International. (2015, June) Hl7 version 3 implementation guide: Annotated ECG (aECG). [Online]. Available: http://www.hl7.org/implement/standards/product brief.cfm? product id=102 [4] R. R. Bond, D. D. Finlay, C. D. Nugent, and G. Moore, “A review of ECG storage formats,” International journal of medical informatics, vol. 80, no. 10, pp. 681–697, 2011. [5] T. B. Garcia et al., 12-lead ECG: The art of interpretation. Jones & Bartlett Publishers, 2013. [6] D. Shreiner, M. Woo, J. Neider, T. Davis et al., OpenGL (R) programming guide: The official guide to learning OpenGL (R), version 2.1. Addison-Wesley Professional, 2007. MIPRO 2016/DC VIS [7] S. Apostolopoulos, “The Open Toolkit library online documentation,” http://www.opentk.com/doc. [8] D. Hudson, R. Ridge, R. Loach et al., “The Tao Framework,” http: //sourceforge.net/p/taoframework/wiki/Home/. [9] E. Einthoven’s, “Ecg graph paper.” [10] M. Malik, “Heart rate variability,” Annals of Noninvasive Electrocardiology, vol. 1, no. 2, pp. 151–181, 1996. [11] T. Pogorelov. (1998, Sep) Ekg qrs detection algorithm. [Online]. Available: http://library.wolfram.com/infocenter/Demos/4476/ [12] R. Gupta, J. Bera, and M. Mitra, “Development of an embedded system and matlab-based gui for online acquisition and analysis of ecg signal,” Measurement, vol. 43, no. 9, pp. 1119–1126, 2010. [13] PhysioNet.org. (2015, July) The WFDB software package, software for viewing, analyzing, and creating recordings of physiologic signals. [Online]. Available: https://www.physionet.org/physiotools/wfdb.shtml [14] G. Kaur, “Design and development of dual channel ecg simulator and peak detector,” Thapar Institute of Engineering and Technology, Deemed University, Patalia, 2006. [15] C. Caner, M. Engin, and E. Z. Engin, “The programmable ecg simulator,” Journal of Medical Systems, vol. 32, no. 4, pp. 355–359, 2008. [16] R. Karthik. (2003) Ecg simulation using matlab. [Online]. Available: www.mathworks.com/matlabcentral/fileexchange/10858 [17] M. Nilsson, G. Bolinder, C. Held, B.-L. Johansson, U. Fors, and J. Östergren, “Evaluation of a web-based ecg-interpretation programme for undergraduate medical students,” BMC medical education, vol. 8, no. 1, p. 25, 2008. [18] W. Lu, D. Wei, X. Zhu, and W. Chen, “A computer model based on real anatomy for electrophysiology study,” Advances in Engineering Software, vol. 42, no. 7, pp. 463–476, 2011. [19] M. Oefinger and R. Mark, “A web-based tool for visualization and collaborative annotation of physiological databases,” in Computers in Cardiology, 2005. IEEE, 2005, pp. 163–165. [20] S. L. Canada. (2016) The six second ecg (cardiac rhythm simulator). [Online]. Available: http://skillstat.com/tools/ecg-simulator [21] M. J. Martins AC, Costa PD. (2016) simecg: Ecg simulator. [Online]. Available: http://simecg.sourceforge.net/ [22] E. Güney, Z. Ekşi, and M. Çakıroğlu, “Webecg: A novel ecg simulator based on matlab web figure,” Advances in Engineering Software, vol. 45, no. 1, pp. 167–174, 2012. 223 Analysis and comparison of algorithms in advanced Web clusters solutions D. Alagić* and K. Arbanas** NTH Media, Varaždin, Croatia Paying Agency for Agriculture, Fisheries and Rural Development, Zagreb, Croatia da@nth.ch krunoslav.arbanas@gmail.com * ** Abstract – Today's websites (applications) represent an essential part of nearly every business system, therefore it is unacceptable for them to be unavailable due to the everincreasing competition on the global market. Consequently, such systems are becoming more and more complex to allow for their high availability. To achieve a higher system availability, a greater scalability is used in creating the socalled Web farms or Web clusters. This system requires much more computer power than the traditional solutions. Since such systems are very expensive and complex in nature, the question is how to obtain the best possible results with the least amount of investment. To achieve that, it is necessary to take a look at the component which contains the information on the request or traffic amount, and that is the HTTP/HTTPS load balancer. The system is based on several algorithms, however, there are no comprehensive analyses that indicate which algorithm to use, depending on the Web cluster and the expected amount of traffic. For this reason, this paper provides a detailed comparison of several frequently used algorithms in several different Web cluster scenarios, i.e. loads. Also, examples are given as to when to use a certain algorithm. I. INTRODUCTION One of the first cloud computing solutions were simple Web sites, i.e. portals that have become a lot more complex in the past two decades due to globalization of the market. Today, companies and organizations increasingly base their operations on the forms of computing that make the information available anytime and anywhere. That is why no company or organization can afford for their Web sites to be unavailable, since that is detrimental to their reputation, business opportunities and revenue [4][14]. In order to provide constant availability of the system, both architecture and infrastructure of these solutions have become a lot more complex and, consequently, more expensive. Therefore, it is almost impossible to find the traditional solution for the Web sites; instead, everything is done with the help of Web clusters or Web farms [17]. Such cluster solutions present a kind of parallel and distributed computer system consisting of computers which are interconnected (in this case a Web server), which represent one computing resource whose performance is far greater than that of regular computers. This allows for a horizontal way of spreading computing resources to more instances of Web servers that are usually in the form of virtual machines (multiple usage of the existing resources) and a lot 224 cheaper than the vertical growth, which represents a strong and independent purchase of powerful and expensive servers [16][20]. Such solutions offer many advantages such as a scalable growth, meaning that it is possible to add more Web servers to handle HTTP / HTTPS requests depending on the increase of the traffic. In addition, such a system allows for greater availability because the requests - with the help of the HTTP / HTTPS load balancer (hereinafter referred to as the LB) - are being distributed on several Web servers (multi-node architecture). Due to that, the chance of a single point of failure is decreased. Since the system consists of several Web servers, such solutions are more favorable when dealing with necessary system changes and upgrades, because it is possible to work on the servers one by one without causing any downtime unlike the traditional concept [8][19]. In order for that to be possible, the architecture of such systems is notably more complex, i.e., it consists of multiple components, thus entailing a greater risk that one of them will undergo an "operational dropout" or a malfunction that would result in the collapse of the entire system. In addition to a larger number of components (servers, storage, etc.), the system is more complex because it uses a more sophisticated equipment with various distributed systems, as well as a large number of protocols for all that to function without a flaw. Besides, such complex systems are quite expensive because they generate large OPEX and CAPEX costs, wherein their usage is not always brought to the maximum. An example of OPEX costs is the maintenance of the infrastructure (servers, network, etc.), as well as other items, such as other monthly expenses: Internet connection, electricity, air conditioning, renting of the space, etc. An example of CAPEX costs is an investment in new servers or network equipment necessary for the operation of Web farms [5]. Figure 1. Schematic representation of the communication between end user and the Web server (the traditional model) MIPRO 2016/DC VIS Another disadvantage is that the SLB algorithms do not collect any feedback on the nodes’ status [1][9]. Due to these limitations, we will consider further only the DLB algorithms since they are quite flexible, and can be modified and improved easily. The following table shows the comparison between the existing DLB algorithms. This paper will not compare and test all those algorithms, but only those which are commonly used and are generally available (open source). Figure 2. Schematic representation of the communication between the end user and the Web servers in the Web farm The main problem here is that the equipment amortization rate is very high, therefore the equipment is trying to be put to a multiple use [10][21]. Due to all these costs, the already existing resources need to be used in the best possible way to minimize the costs, all the while making sure to not hinder the results. To do so, we will take a look at one of the main components of a Web farm, which is the LB system based on a number of algorithms that have been used for many years. Considering the fact that the aforementioned algorithms are widely accepted in the LB systems and many others, the question is which algorithm to use and in which case for the optimal utilization of the system. In order to find out which algorithm will provide the best results in certain cases, it is necessary to fully understand their operating principles. Depending on the type of distribution of the tasks, i.e. the requirements, the LB algorithms are divided into static (SLB) and dynamic (DLB) ones [6][13][14]. In SLB algorithms, the requirements are assigned to the executors (i.e. nodes) from the start, depending on their characteristics. During the operation, it is not possible to change the tasks or increase the number of nodes. TABLE I. COMPARISON BETWEEN DIFFERENT ALGORITHMS FOR THE DYNAMIC LOAD DISTRIBUTION [1] Rand Prand Acwn Pacwn The possibility of verifying the state No Partially Yes Yes Cyclic Partially Probalistic Threshold Least Reception Central Global Offer Radio Sed Nq Partially Partially Partially Partially Yes Yes Yes Yes Yes Yes Algorithm MIPRO 2016/DC VIS Performance Excellent Excellent Good Good A little better than the rand Good Better Better Not so good Excellent Good Bad Good Good Good As previous research showed (see the table above), there is not a large number of DLB algorithms used in different situations, i.e. there is no universal DLB algorithm to be applied to all computing distributed systems [1]. Although almost all DLB algorithms have the ability to verify the nodes status, i.e. the flow of information and their load, none of them has the complete information on the status of compute power. Their verification applies only to the status and availability at the application level, but not at the level of the operating system. Thus, it is presumed that a LB system could be more effective if it had a complete overview of the system, i.e. an overview of its resources and the actual load. However, that is a topic for another research. II. RESEARCH PROBLEM Previous research in cloud computing is related to the load balancer systems in Web clusters [2][3][11] [16][17][18] [19], but also to the algorithms used in such arrangements [1][7][15][17][20]. There are many researches that deal with the topic, but there are two problems: 1) Testing repeatability - in the tested environment, the measurements are described poorly or not at all, while the test samples are quite specific. In other words, it is not possible to track those results in order to further expand the existing research. 2) Partial analysis - those load balancer algorithms that were compared are incomplete, i.e. a small number of algorithms was compared, and in many cases the observed variables are not clearly stated or defined. For this reason, this paper will provide a complete analysis of the LB system as well as the comparison between its algorithms. During this research, the test environment will be described in detail and the applied technologies will be open source to enable for a higher solution availability which enables the obtained data to be used for future research. III. DATA SOURCES The relevance of the data is provided by inclusion of the data from a number of IT companies which provide cloud computing services, or more accurately, Web hosting services. These solutions are based on the Web clusters that were previously explained. 225 All research and testing was carried out in one of the data centers with a full access to the equipment and the necessary financial data, i.e. the indicators essential for this research. In the given example, it is evident that the company has a number Web clusters, categorized according to the type of business or content. There are several reasons as to why this type of architecture was opted for; one of them is to achieve a high system availability, since a certain group of Web sites is constantly exposed to security attacks causing a hindrance and potential unavailability of the system. The chart also makes it possible to conclude that the resource utilization is not linear and that it depends on the type of the content. In other words, informative and business Web sites are visited more during the day, while adult sites and sites containing video games are frequented by night. Given that there is a large part of computing resources that are hibernating (that are not used optimally in the course of 24 hours), the question is how to increase the level of utilization of the computing resources. If we present all this data on one chart, i.e. on a single Web cluster, it can be noted that an overload is present on the entire cluster. In other words, not even a larger system could handle the entire load, wherein there would be time periods when it would have very little traffic or load. Thus, whether it be a number of separate Web clusters or one major cluster, the computing resources would not be used properly, and if we go back to the beginning of this topic, we can remember how these resources generate large monthly and capital costs. Thus, the goal is to maximize the existing resources without increasing the number of Web servers, as it is done with commercial tools for the virtualization and configuration of Web clusters, such as VMware. Such tools perform a continuous monitoring of all Web servers and their resources, causing a deploy of virtual machines or Web servers in critical situations (e.g. in the case of a sudden increase of traffic and the subsequent system overload). This guarantees for the stability and availability of the system, but it also generates additional costs due to the increased number of servers. One way to make a better use of computer resources is to choose the best LB algorithm depending on the situation and the amount of traffic that is present or expected. This may not necessarily be true, but in order to remove any doubt, this paper provides the analysis and the comparison of a number of most commonly used algorithms according to several different cases varying in the request amount. The results will be gathered and compared from a number of computer resources monitoring tools, but also from the LB system in which it is possible to monitor the load and the number of visits in real time. The purpose is to determine whether considerable savings are possible in regard to the computing resources, or if the results are so similar that it does not matter neither which algorithm is applied nor in which case. 226 Figure 3. The system load by Web site category for 24 hours Figure 4. The total system load in all Web sites categories of for 24 hours IV. TEST ENVIRONMENT AND CASES The entire test environment is based on open source tools, services and operating systems allowing for reproducibility and greater availability of for further analysis and research. Given that the test environment's main objective is to test the load of Web nodes, the complete redundancy of all service was not employed: e.g. only one LB was configured, but not his passive replica. Also, in order to simplify the reproducibility of the test, classic local hard drives were used rather than a central storage, since the main focus is on the computing resources of the Web server and not on the data processing performance of the storage. However, those local disks contain a database that is configured in master - master replication, i.e. an active synchronization of data between two databases. The environment consists of two Web clusters with three Web servers, one of which is Apache and the other nginx. Both Web services are included in order to fully test both of them, depending on the type of content (static/dynamic). As for the Website, or the content, a generic Web site for CMS (Content Management System) system was used, called WordPress (available at the following URL: https://wordpress.org/latest.zip). For the testing purposes, one Web site was installed on each Web cluster. The content was not altered so that the test could be repeatable, as well as used for further research. As far as the infrastructure is concerned, the solution consists of three physical servers with the following virtual machines:  Physical server host1 with virtual machines: dtlb1 (LB), dt-db1 (first database) and dt-db2 (second database).  Physical server host2 with virtual machines: dtweb1, dt-web2, dt-web3 (nginx Web cluster). MIPRO 2016/DC VIS  Physical server host3 with virtual machines: dtweb4, dt-web5, dt-web6 (Apache Web cluster). Server specifications:  host1: 2x CPU, both quad-core, RAM 32GB, local hard drive 140GB.  host2: 2x CPU, both quad-core, RAM 32GB, local hard drive 140GB.  host3: 2x CPU, both quad-core, RAM 32GB, local hard drive 140GB.  dt-lb1: CPU, quad-core, RAM 4GB, local hard drive 10GB.  dt-web1: CPU, quad-core, RAM 4GB, local hard drive 10GB.  dt-web2: CPU, quad-core, RAM 4GB, local hard drive 10GB.  dt-web3: CPU, quad-core, RAM 4GB, local hard drive 10GB.  dt-web4: CPU, quad-core, RAM 4GB, local hard drive 10GB.  dt-web5: CPU, quad-core, RAM 4GB, local hard drive 10GB.  dt-web6: CPU, quad-core, RAM 4GB, local hard drive 10GB.  dt-db1: CPU, quad-core, RAM 4GB, local hard drive 20GB.  dt-db2: CPU, quad-core, RAM 4GB, local hard drive 20GB.    Test environment specifications:  Physical server: HP ProLiant DL360 G7  OS: CentOS 6.7  Virtualization platform: Foreman Version 1.9.1 and Libvirt (virsh) 0.10.2  Database: MySQL mysql Ver 14.14 Distrib 5.6.12  LB: HA-Proxy version 1.5.4 2014/09/02  Scripting language: PHP 5.6.13  Web servers: Apache/2.2.15 and nginx/1.8.0  CMS (Web site): WordPress Version 4.3 Tools used for the assessment and comparison of the results:  ApacheBench - enables traffic generation for a specific Web site  Observium 0.15.12.7231 - monitoring of computer resources and service load  HAProxy version 1.5.4 - monitoring of traffic, i.e. the load of Web serves in real time Three different cases were employed for test purposes:  Case 1 - Linear load of 100 consecutive requirements  Case 2 - Load of 1000 requirements, 200 are done simultaneously  Case 3 - Load of 10000 requirements, 500 are done simultaneously in the course of 10 seconds Tested algorithms [12]:  Roundrobin - according to this algorithm, each server is used in turns according to the weights. It works best MIPRO 2016/DC VIS   when the processing time on the server is equally distributed. It is dynamic in nature, meaning that the server weight can be adjusted. It is limited to 4095 active servers per backend. In some particular (and very rare) cases, it can take several hundred requests for the server to be re-integrated after being down for a short time. Static-rr - just as with the previous algorithm, each server is used in turns according to the weights. However, this algorithm is static, meaning that if the server’s weight is changed during the process, it will have no effect whatsoever. Next, this algorithm is not limited regarding the number of servers, and when the server goes up, it is always immediately reintroduced into the farm. It also uses slightly less CPU to run (approximately 1% less CPU). Leastconn - with this algorithm, the server with the lowest number of connections will receive the connection. Round-robin is performed within server groups working with the same load to ensure that all servers will be used. It is recommended to use this algorithm with very long sessions, such as LDAP, SQL, TSE, etc. Yet, it is not very well suited for protocols using short sessions, such as HTTP. The algorithm is dynamic, meaning that server weights may be adjusted during the process. First - here, the first server with available connection slots will receive the connection. Once a server reaches its maxconn value, the next server will be used. The purpose of this algorithm is to always use the smallest number of servers so that extra servers can be powered off during non-intensive hours. The server weight is not important, and the algorithm is more efficient in long session such as RDP or IMAP than HTTP, though it can be used there as well. To efficiently use this algorithm, it is recommended to check server usage regularly and turn off unused servers, as well as to regularly check backend queue to turn servers on when the queue inflates. Alternatively, using "http-check send-state" may provide information on the load. Source - The source IP address is hashed and divided by the total weight of servers to determine which server will receive the request, ensuring that the same client IP address will always reach the same server if the number of servers remains the same. If the hash result changes, many clients will be directed to a different server. The algorithm is generally used in TCP mode where no cookie may be inserted, but can also be used to provide a best-effort stickiness to clients refusing session cookies. It is static by default, meaning that changing a server's weight in the process will have no effect, but this can be altered by using "hash-type". Uri - This algorithm hashes either the left part of the URI or the whole URI and divides the hash value by the total weight of servers, determining which server will receive the request. This is used with proxy caches and anti-virus proxies in order to maximize the cache hit rate. Note that this algorithm may only be 227 used in an HTTP backend. It is static by default, meaning that changing a server's weight in the process will have no effect, but this can be changed using "hash-type". The algorithm supports two optional parameters - "len" and "depth", both followed by a positive integer number, which can be helpful when you need to balance the servers based on the beginning of the URI only. With the "len" parameter, the algorithm only considers that many characters at the beginning of the URI to compute the hash. The "depth" parameter indicates the maximum directory depth to be used to compute the hash.  rdp-cookie - The RDP cookie <name> (or "mstshash" if omitted) will be looked up and hashed for each incoming TCP request. This is useful as a degraded persistence mode, always sending the same user (or session ID) to the same server. If the cookie is not found, the normal roundrobin algorithm is used instead. Note that the frontend must ensure that an RDP cookie is already present in the request buffer. For this you must use 'tcp-request content accept' rule combined with 'req_rdp_cookie_cnt' ACL. This algorithm is static by default, meaning that changing a server's weight in the process will have no effect, but this can be changed using "hash-type".  url_param - The URL parameter specified in argument will be looked up in the query string of each HTTP GET request. If the modifier "check_ post" is used, an HTTP POST request entity will be searched for the parameter argument, when it is not found in a query string after a question mark. The message body will be analyzed only after the advertised amount of data has been received or if the request buffer is full. In the rare case of using chunked encoding, only the first chunk is searched. If the parameter is followed by the “=” sign and a value, the value is hashed and divided by the total weight of the running servers, designating which server will receive the request. This is used to track user identifiers in requests and ensure that a same user ID will always be sent to the same server if the number of servers remains the same. If no value or parameter is found, a round robin algorithm is applied. Note that this algorithm may only be used in an HTTP backend. It is static by default, meaning that changing a the weight in the process will have no effect, but this can be changed using "hash-type".  hdr - The HTTP header <name> will be looked up in each HTTP request. Just as with the equivalent ACL 'hdr()' function, the name in parenthesis is not case sensitive. If the header is absent or does not contain any value, the roundrobin algorithm will be used instead. An optional 'use_domain_only' parameter is available for reducing the hash algorithm to the main domain part with some specific headers such as 'Host'. This algorithm is static by default, which means that changing a server's weight on the fly will have no effect, but this can be changed using "hash-type". 228 Figure 5. Test environment architecture The algorithms url_param and hdr were left out since they are very specific and rarely used in clusters with a larger number of domains. V. COMPARISON AND RESULTS The testing was done Apache and nginx Web clusters for all the above mentioned cases. During testing, two main variables were monitored: the time necessary for the processing of all requirements (measured in seconds) and the system load, measured by the amount of computational work performed by a computer system. TABLE II. Time of execution 22.448 17.117 16.098 21.604 3.128 2.870 19.358 Algorithm leastconn roundrobin static-rr first source uri rdp-cookie TABLE III. leastconn roundrobin static-rr first source uri rdp-cookie TABLE IV. leastconn roundrobin static-rr first source uri rdp-cookie Average system load 0.5566 1.10333 0.71000 0.07000 0.02666 000001 0.07666 APACHE WEB CLUSTER – CASE 2 Time of execution 44.078 34.655 34.802 103.678 13.239 8.979 33.612 Algorithm Algorithm APACHE WEB CLUSTER – CASE 1 Average system load 29.63667 23.50667 22.73667 45.97333 1.55666 2.59 21.41333 APACHE WEB CLUSTER – CASE 3 Number of successfully executed requests 581 402 252 22 1009 1139 10.037 Average system load 9.65 5.26 23.65666 5.40334 2.87666 3.11 4.24333 MIPRO 2016/DC VIS TABLE V. NGINX WEB CLUSTER – CASE 1 Time of execution 17.599 16.301 16.336 23.504 0.087 0.084 19.844 Algorithm leastconn roundrobin static-rr first source uri rdp-cookie TABLE VI. NGINX WEB CLUSTER – CASE 2 Time of execution 6.771 35.342 33.843 68.544 0.309 0.188 35.993 Algorithm leastconn roundrobin static-rr first source uri rdp-cookie TABLE VII. Algorithm leastconn roundrobin static-rr first source uri rdp-cookie Average system load 0.00667 0.07000 0.10333 0.02 0.000001 0.000001 0.00666 Average system load 0.38333 20.8 20.221666 27.9666666 0.000002 0.000001 22.95333 NGINX WEB CLUSTER – CASE 3 Number of successfully executed requests 25272 404 351 69 50000 50000 403 VI. Average system load 0.61333 6.23 3.53667 4.63667 0.000001 0.000001 6.33 CONCLUSION The analysis shows that some algorithms provide much better results than others. Namely, when looking at the ratio of spent resources and the execution time for the first case of Apache cluster, the best results were obtained by the source and uri algorithms. These algorithms also provided the best results in the second case, while the worst second case result was given by the algorithm first (two to three times worse results). In the case of a higher load, i.e. in the third case for the Apache cluster, the best results were again obtained by algorithms source and uri. With the help of these algorithms, the largest number of successfully processed requests featuring a lesser system load was obtained in comparison with the remaining algorithms. A similar situation was found in the nginx cluster, where the best results in all the cases were again obtained with the help of source and uri algorithms, while the algorithm first again gave the worst result. Other algorithms provided more or less the same results, except for the algorithm leastconn, which provided solid results in the third case, compared to algorithms other than source and uri. computer resources which in turn positively reflect on the CAPEX and OPEX costs mentioned at the beginning. REFERENCES [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] Ali, M.F.: The Study On Load Balancing Strategies In Distributed Computing System. Int. J. Comput. Sci. Eng. Surv. 3, 2, 19–30 (2012). Aversa, L., Bestavros, a.: Load balancing a cluster of web servers: using distributed packetrewriting. Conf. Proc. 2000 IEEE Int. Performance, Comput. Commun. Conf. (Cat. No.00CH37086). (2000). Chen, W. et al.: Design and implementation of server cluster dynamic load balancing in virtualization environment based on OpenFlow. Proc. Ninth Int. Conf. Futur. Internet Technol. 691– 697 (2014). Chung-Cheng Li, Kuochen Wang: An SLA-aware load balancing scheme for cloud datacenters. Int. Conf. Inf. Netw. 2014. 58–63 (2014). Gruber, C.G.: CAPEX and OPEX in Aggregation and Core Networks. 2009 Conf. Opt. Fiber Commun. - incudes post deadline Pap. 9–11 (2009). Iyer, S.: Load balancing and parallelism for the internet. July, (2008). Ji, Z., He, B.: A dynamic load balancing method for parallel rendering and physical simulation system based sort-first architecture. Proc. 2011 Int. Conf. Comput. Sci. Netw. Technol. ICCSNT 2011. 3, 1792–1796 (2011). Jung, J. et al.: Self-Adapting Load Balancing for DNS. 564–571 (2001). Khiyaita, a. et al.: Load balancing cloud computing: State of art. Netw. Secur. Syst. (JNS2), 2012 Natl. Days. 106 – 109 (2012). Knoll, T.M.: A combined CAPEX and OPEX cost model for LTE networks. (2014). Lin, Z. et al.: A content-based dynamic load-balancing algorithm for heterogeneous web server cluster. Comput. Sci. Inf. Syst. 7, 1, 153–162 (2010). Luke, S. et al.: Essentials of Metaheuristics A Set of Undergraduate Lecture Notes by BibTEX : (2011). Mcheick, H. et al.: Load Balancing Mathematical Model. 2011 Dev. E-systems Eng. 581–586 (2011). Pham, V. et al.: Gateway load balancing in future tactical networks. Proc. - IEEE Mil. Commun. Conf. MILCOM. 1844– 1850 (2010). Popa, L. et al.: A Cost Comparison of Data Center Network Architectures. Proc. 6th Int. Conf. Co-NEXT ’10. (2010). Rajavel, R.: De-Centralized Load Balancing for the Computational Grid Environment. Comput. Intell. 419–424 (2010). Teodoro, G. et al.: Load balancing on stateful clustered Web servers. Proceedings. 15th Symp. Comput. Archit. High Perform. Comput. (2003). Ungureanu, V. et al.: Effective load balancing for cluster-based servers employing job preemption. Perform. Eval. 65, 8, 606–622 (2008). Werstein, P. et al.: Load Balancing in a Cluster Computer.Pdf. (2006). Yong Meng Teo, Ayani, R.: Comparison of Load Balancing Strategies on Cluster-based Web Servers. Simulation. 77, 5-6, 185–195 (2001). Opex-based data centre services: Co-location, managed services and private cloud business support. 7,. The purpose of the testing was to identify the algorithms which provide the best results in various cases. This research singles out algorithms source and uri, since they are multiple times more successful that the rest, which is a surprising fact since algorithms such as lastconn and roundrobin are most frequently in use. It was noted not only that algorithms can provide excellent resuts, but also that they can enable multiple savings of MIPRO 2016/DC VIS 229 Metamodeling as an Approach for Better Computer Resources Allocation in Web Clusters D. Alagić* and D. Maček ** NTH Media, Varaždin, Croatia UniCredit S.p.A. Zweigniederlassung, Wien, Austria da@nth.ch davor.macek@foi.hr * Abstract - Constant changes are inherent to information technology, the proof of which can be found in many challenges, such as the business sector’s cloud computing. Because of the recent economic and financial crisis companies are forced to provide as best results as possible with the least amount of costs. The main issue of cloud computing are the computer resources (i.e. compute power). Due to their less than optimal usage, computer resources generate high-level costs. The term cloud computing encompasses a wide range of technologies, therefore this paper will focus only on Web clusters. It will describe the issue of improper use of resources in these systems and its main cause, as well as provide suggestions for further optimization. Additionally, the paper will present a new concept of HTTP / HTTPS traffic analysis which should enable a more efficient usage of computing resources. The concept has emerged from metamodeling of two methods. The first one is method for distribution of HTTP / HTTPS requests, and the other one is the Analytic Hierarchy Process (AHP) method which is used for classification and prioritization of requirements so that the entire Web cluster could operate as best as possible with the least amount of resources. I. INTRODUCTION Note: The views expressed in this article are those of the authors and do not necessarily reflect the views of the NTH Media, UniCredit S.p.A., or Zagrebačka banka d.d. In the last decade, the concept of cloud computing has been used more frequently and we could say that it has become an important part of any modern business system as such. Cloud computing is a type of computing that relies on sharing the computing resources with another computer, starting with the applications and including a variety of related services [4][9][10]. In order for the cloud computing to work, an IT (Information Technology) infrastructure is necessary, i.e. the Data Center. It is a well-known fact that such centers are not cheap, considering their direct, indirect and general costs. For a data center to survive in today's market, continuous investments in the growth and improvement of the system are required. In other words, capital expenditures (CAPEX) as well as operating costs (OPEX) are present [8][21]. The following chart represents the total cost of the ownership of a single data center. 230 Figure 1. Total Cost of Ownership for DC [16] An example of CAPEX cost in a single data center represents an investment in new servers or network equipment, while OPEX cost refer to the cost of maintaining the system, such as: Internet connection, electricity, air conditioning, space, maintenance and replacement of hardware, rental space, etc. [11][16]. Due to these costs, the cloud computing providers are trying to engage in multiple usage of the existing resources. In other words, they strive to achieve the automation and optimization of the system. Previous results have shown that servers represent the biggest costs because they result in high capital costs, and at the same time have a high amortization rate. They also generate high maintenance costs [16]. Therefore, the question is how to apply multiple usage to such infrastructure, i.e. to servers in order to minimize the mentioned costs. Due to this problem, the virtualization of computing resources appeared thirty years ago with which it has become possible to distribute the resources of a single physical device (in this case, the server) between several, smaller virtual environments, i.e. between a number logical or application process [15][19]. However, virtualization is now present in all data centers and it is not "sufficient" as such [18]. In other words, it is necessary to search for new ways and technologies for multiple utilization and optimization of computing resources next to virtualization, since that would result in a smaller number of servers, and ultimately reduce the consumption of electricity and other operating expenses. Thus, by employing a single variable - computer power - multiple savings are possible. To achieve it, it is necessary to define the limits, i.e. to define what kind of technology and services will be considered, since cloud computing includes a number of them and MIPRO 2016/DC VIS also, they are rather specialized in terms of allocation and utilization of resources. Cloud computing services are categorized as follows: Infrastructure-as-a-Service (IaaS), Platform-as-a-Service (PaaS) and Software-as-a-Service (SaaS) [3][5][9]. This paper mainly focuses on PaaS and SaaS services based on the HTTP / HTTPS protocol. An example of such services are Web sites where the user may or may not have a complete access to and control over the application and its contents or the data. Sets of such combined solutions, i.e. Web sites are called Web farms, whose architecture and design depend on the technology and type of content (static or dynamic). For a better understanding of the main cause of computer power consumption, we have to start from scratch, i.e. from the amount of information that must be processed in the allocated time. II. RESEARCH PROBLEM In the past decade, there has been a rapid increase in the use of cloud computing. To enable the existence of such infrastructure, a large number of data centers was made [6][10]. Such infrastructure is quite expensive considering that its primary goal is to allow continuous operation of all cloud computing services. In order to allow for it, the entire system must be redundant (power installations, UPS, air conditioning, network infrastructure, etc.), wherein it has to have a spare solution or system (a power generator, a BGP connection, other geographic collocation for business continuity purposes, etc.). All these items generate extremely high CAPEX and OPEX costs which represent the main motivation for improvement so as to allow savings at all levels of the infrastructure and services [16]. Furthermore, this paper pinpoints the problems as well as open issues representing the main motivation for this research: • High costs of data centers - High operating costs, such as maintenance and electricity consumption, as well as the renting of the DC area which is usually charged by the square meter. Large capital costs at every growth systems and at the same high rate of depreciation on this type of goods. In other words, the question is: how to allow for a multiple usage of resources so as to decrease the amount of investment. • System complexity and availability - Nowadays, all important business systems (but also all others) are located in the data center. The vast majority of these systems must not have any interruptions in their work, meaning that the information systems and infrastructure of a data center must be available at all times. In order to achieve that, information systems are becoming increasingly complex, i.e. they have an increasing number of components to be installed, configured and maintained (e.g. redundancy, backup, etc.). Thus, the question is how to simplify such systems without jeopardizing their functionality and availability. There is a number of commercial solutions, but are complex and expensive, and they MIPRO 2016/DC VIS also require a lot more resources and investments in infrastructure for the availability of the information systems. • Computer power utilization - As it was already mentioned, the virtualization of resources has existed for the past few decades. However, the systems are becoming larger and more complex, so that the current solutions are no longer sufficient if we want to achieve a greater utilization of computer resources. Modern solutions for a multiple utilization of resources are mostly specialized, i.e. their application scope is limited, and thus the savings are not large-scale. In other words, the solution should be scalable and adaptable to multiple systems. Due to all these problems, the question is how to achieve a more efficient utilization of computing resources in order to reduce the costs while simplifying the system. Today, there are several models of Web clusters. However, all of them share the system for the allocation of HTTP / HTTPS requests (hereinafter referred to as the LB system). Since the LB system contains information on the real amount of traffic which ultimately affects the computing resources, it can be assumed that it is possible to achieve the goals of this research if we improve the operation of the LB system. To achieve that, this paper analyzes two methods whose combination, or metamodel, should allow for more efficient results. III. METHODS AND THEIR MODELS This paper presents two methods: the first method represents the basis for the operation of the LB system, while the second method serves for the prioritization of the type of Web traffic in the cloud infrastructure. 1) Load Balancing method The method which represents the basis of the LB system is always the same, only the operating algorithms change. The following algorithms are well-known:  Round robin – This method for the allocation of HTTP/HTTPS traffic is based upon one of the simples and most frequently used algorithms. The entire procedure is done by allocating a time interval to each process in which it has to be done. Upon the completion of the allocated time interval, the system will stop the request and go on to the next process on the list. If the process is completed before the time runs out, the system will process another request. Having used its allocated time interval, the process goes on to the end of the list. It is important to note that the algorithm ensures that a time interval is assigned to each and every request. The main issue here is the determination of the optimal length of time. If shorter, the system assigns a larger number of requests, thus consuming more processor time while transferring requests from one process to the other. If the time interval is longer than necessary, the response for the execution of 231  requests is also longer, resulting in a decreased number of processed requests [12]. Weighted Least Connections – As with the previous method, it is likewise necessary to have a knowledge of the system and the expected number of requests, since that is the basis for the determination of the maximum number of requests (connections) which can be processes by a single nod. The process is done in cycles, wherein the request first go to the node with the largest number of free connections, i.e. the largest amount of available resources [12]. The metamodel of the method for the allocation of HTTPS/HTTPS request is shown through the example of a company dealing with Web hosting. The following table shows all key entities and their attributes. TABLE I. METAMODEL METHOD OF THE LB SYSTEM Data model Appearance data Department: “213” Department name: “ICT” Team: “IT” ICT Resource: “server specification” Web cluster: “web01corporate” Result: “url” 10 CPU cores http://www.co and 50 GB of mpany.com/c Memory ontact Entity: knowledge Attribute: knowledge_id department_id Entity: Entity: “person” “method” Attribute: Attribute: “person_id” “method_id” “first_name” “lb_algorithm_i “last_name” d” “type_of_job” Person: “422432” Method: First name: “Load “Tom” Balancing” Last name: Algorithm: “Clark” “Round Robin” Type of job: “Technician” Tom Clark Entity: it_service Attribute: service_id Appearance data Entity: “process” Attribute: “process_id” Entity: Entity: “resource” “result” Attribute: Attribute: “resource_id” “result_id” “web_cluster_id” Process: “HTTP GET request” Knowledge: “131” Department: “213” Service: “562” Real world appearance Data model Entity: “department” Attribute: “department_id” “name” “team” Real world appearance Meta model Object: Entity Attribute http://www.company.c om/contact_form.asp? name1=value1&name 2=value2 Web Clustering Hosting Web Site Round Robin Entity: external_sources Attribute: end_user_id http_method http_request End user: “33.176.216.1” HTTP method: “POST” HTTP request: “http://www.compa ny.com” 33.176.216.1 Figure 3. ER model - Load Balancing method Figure 2. Business model - Load Balancing method 232 2) Method for the prioritization of the type of Web traffic in the cloud infrastructure It is clear that load-balancing algorithms for the utilization of some parts of the cloud infrastructure are already developed and well-known. But there is a problem regarding the efficient prioritization of the usage of Web farm resources according to the type of Web traffic. For example, due to high CAPEX and OPEX costs, it would be quite unacceptable to allocate a Web MIPRO 2016/DC VIS The Analytic Hierarchy Process is among the most widely exploited decision making techniques when the decision is based on several tangible and intangible criteria and sub-criteria. It is recognized as one of the leading theories in multicriterion decision making field. The application of the AHP technique has increased significantly in the recent years, especially in the field of IT due to its mathematical proven basis. The reason for choosing the AHP and not some other MCDM technique lies in the fact that there were already many proven researches on applying the AHP to the load-balancing MIPRO 2016/DC VIS The following table shows the metamodel (Table II.) of the AHP method for the prioritization of the cloud Web traffic, according to which the business model was designed (Figure 4.). From the defined business model shown in Figure 4, the ERA (Entity-RelationshipAttribute) model was derived with all of its tables, attributes and necessary relationships (Figure 5.). TABLE II. METAMODEL OF THE AHP METHOD FOR WEB TRAFFIC PRIORITIZATION Object: Entity Attribute Meta model Data model The Analytic Hierarchy Process (AHP) is a structured technique for organizing, analyzing and making complex decisions, based on mathematics and psychology. The AHP is a multi-criteria decision-making (MCDM) approach, introduced by mathematician Thomas L. Saaty. The AHP is a decision making support tool which can be used to solve complex decision problems. It uses a multilevel hierarchical structure of objectives, criteria, subcriteria and alternatives. Important data are derived by using a set of pairwise comparisons. These comparisons are further used to obtain the weights of importance of the decision criteria and the relative performance measures of the alternatives in terms of each individual decision criterion [7]. So, the AHP is an effective MCDM tool that meets the objective of decision making by ranking the choices according to their merits. Now, upon clarifying which MCDM method will be used, it is necessary to define the AHP criteria for the prioritization of types of Web traffic to choose the optimal load-balancing system. To make a set of necessary criteria for the types of Web traffic, it is also necessary to know which kind of business environment it entails. In this case, we concentrated on the banking business environment and their Web traffic. The selected criteria are influenced by business requirements and levels of criticality of individual network services that support certain business banking models. Thus, for the bank Web traffic we defined the following evaluation and prioritization criteria, in addition to the selection of the optimal cluster node: public Web, e-banking, m-banking, e-commerce and IP POS traffic. The organization uses the AHP (Analytic Hierarchy Process) technique as a method for the prioritization of Web traffic (domain) in the cloud environment. The actual goal is to prioritize the cloud Web traffic based on its type, and to select an adequate (or optimal) alternative (web cluster node) according to the defined criteria – a kind of Web traffic tagging. Appearance data The reason for choosing a multi-criteria decisionmaking technique rather than a technique such as the SWOT analysis (Strengths, Weaknesses, Opportunities, and Threats) lies in its applicability. The SWOT analysis indeed provides a sufficient amount of quite descriptive data in regard to each element of the analysis. Yet, it does not include a defined criteria by which some individual system or solutions could be evaluated and/or prioritized. In this specific case, that would mean to choose an appropriate Web cluster system according to the type of Web traffic. Also, the final result of the evaluation done with the help of the MCDM technique provided some numeric/quantitative ratios among alternatives, i.e. the Web cluster nodes. This is certainly measurable (and absolutely necessary), unlike the descriptive information provided by the SWOT analysis, which does not facilitate the selection of the optimal cluster node. There are a few MCDM techniques available for complex decision making issues, but the AHP and TOPSIS techniques are among the most frequent techniques used today [2]. The reason for their extensive usage is their proven mathematical foundation. issues [1][20][22]. It is true that the TOPSIS MCDM technique also has a certain number of proven applications in multi-criteria decision making problems related with load-balancing issues, but that number is still lesser when compared to the AHP technique [13][17]. Also, according to the research, the AHP has more applications in banking industry [2]. Real world appearance farm with plenty of resources (CPU, RAM) to a less critical type of HTTP or HTTPS traffic. This would present an imminent financial risk due to inadequate managing of the available resources. Operational risk would likewise be present, as well as the reputational one, if a critical type of traffic is left without enough resources. Therefore, in the context of the cloud computing and Web traffic, the aim is to choose the optimal load-balancing system depending on the type of Web traffic. This leads us to the multi-criteria decision making problem regarding the conditions of uncertainty and risk. In order to address this complex problem, it is necessary to use one of the multiple-criteria decision making (MCDM) or decision analyses (MCDA) techniques available today. Entity: “organization” Entity: Entity: Entity: Entity: “method” “domain” “goal” “criteria” Attribute: “organization_id Attribute: Attribute: Atribut: Atribut: ” “method_id” “domain_id” “goal_id” “criteria_id” “name” “name” “description” “description” “name” “address” Organization: Method: “213” Goal: “7” “58” Name: Domain: Criteria: Name: “CloudOcean” “Web Site” Description: “14” “Analytic Address: Description: “Web Traffic Name: Hierarchy “30 Algonquin, “url” Prioritization “e-banking” Process New York, NY, ” (AHP)” USA” CloudOcean Analytic http://www.co Web Traffic Hierarchy mpany.com Prioritization Process (AHP) e-banking Entity: “alternative” Atribut: “alternative_i d” “name” Alternative: “22” Name: “Web Cluster web01corporate” Web Cluster web01corporate 233 Figure 4. Business model for the AHP method for Web traffic prioritization and the AHP method, we succeeded in developing a new hybrid metamodel capable of prioritizing Web traffic according to its type. The metamodel can also help load balancers to direct the traffic to the appropriate Web cluster node, ultimately resulting in a better resource allocation and reduction of costs which represent the ultimate goals of this research. V. EXPECTED CONTRIBUTIONS The purpose of this research is to propose a concept model combining two methods, the implementation of which would entail multiple social and business justifications for the following reasons: • • Figure 5. ER model for the AHP method for Web traffic prioritization IV. COMMON METAMODEL METHODOLOGY The Load Balancing (LB) method and the AHP method for Web traffic prioritization have only two common points – the method entities and Web traffic type (IT_service in LB_method and its corresponding table Domain in the AHP_method). However, these two common points are actually crucial for the existence of this new hybrid metamodel. By merging the LB method Wide availability - today there are many commercial solutions that are very expensive and demanding at the same time, i.e. it takes a lot of compute power for their proper operation. The purpose of this paper is to use the open source technology and solutions so that everyone can benefit from them, not only ICT businesses, but also the scientific community for research purposes. The new algorithm will also be publicly available to everyone. Improved utilization of resources - one of the main objectives is: "How to achieve multiple usage of the existing compute power"? As noted above, the virtualization of computing resources is present almost everywhere however, for it requires the socalled bare bone (physical resources) whose price range and amortization are quite high the new algorithm should orchestrate Web servers and their free resources on its own and allocate them where necessary. This is a drastic change compared to the commercial solutions which solve the lack of resources by configuring and delivering new instance of Web server, which ultimately consumes even Figure 6. ER model for the common metamodel 234 MIPRO 2016/DC VIS • • • • more compute power. Increased system availability - since compute power can be better utilized, this means that higher scalability and system availability are also possible. Wide use - the goal of this solution represents its broader application and the fact that the new algorithm can be applied to all types of Web services (Nginx and Apache), but also for both types of content, that is the static and the dynamic. Increased financial profitability (lower OPEX) - the main requirement for any cloud computing is greater availability. In order to achieve this, a larger and more complex infrastructure is used that is expensive to maintain. The new algorithm should allow multiple utilization of computing resources that provides even greater scalability with a lesser server density, i.e. the necessary infrastructure. Since the computing resources can be better utilized, we can assume that the new algorithm should reduce the number of physical servers which would result in savings because it would allow lower maintenance costs (less equipment and network infrastructure), a smaller number of equipment (cost of maintaining and replacing hardware), lower monthly expenses of the data center (electricity and air conditioning) and the like. Environmental protection - due to the already mentioned high costs of a single data center, very few of them are in compliance with the environmental regulations. In other words, the electricity they use for their infrastructure does not come from renewable energy sources, therefore the aim is to reduce their electricity consumption. The new algorithm should reduce the number of servers due to the possibility of reallocating computing resources which would ultimately result in lower energy consumption. VI. CONLUSION In the last decade, there has been a rapid increase regarding the usage of the cloud computing. In order for a such infrastructure to exist, a large number of data centers has been made [6][10]. Such infrastructure is quite expensive since its primary goal is to enable a continuous operation of all cloud computing services. To achieve that, the entire system must be redundant (power installations, UPS, air conditioning, network infrastructure, etc.), with a secured backup system (a power generator, a BGP connection, other geographic collocation, etc.). All these items generate extremely high CAPEX and OPEX costs which is the main motivation for improvement so as to allow for savings at all levels of the infrastructure and services [16]. Consequently, this paper proposes a new model created through a combination of two methods, which could lead to achieving the set goals along with further upgrading and improvement. For future research, we propose the development of some scripts (e.g. Python scripts) in the first phase. In the second phase, we propose the development of the entire software solution MIPRO 2016/DC VIS in a certain programming language to support this new metamodel. Also, we propose to investigate as to whether is possible to use some other multi-criteria decision making tool like TOPSIS, instead of the AHP. In case of success, that would be additional forte for this hybrid metamodel because of its modularity and scalability. REFERENCES [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] Aghazarian, V.: RQSG-I : An Optimized Real time Scheduling Algorithm for Tasks Allocation in Grid Environments. 205–210 (2011). Aruldoss, M.: A Survey on Multi Criteria Decision Making Methods and Its Applications. Am. J. Inf. Syst. 1, 1, 31–43 (2013). Celesti, A. et al.: An approach to enable cloud service providers to arrange IaaS, PaaS, and SaaS using external virtualization infrastructures. Proc. - 2011 IEEE World Congr. Serv. Serv. 2011. 607–611 (2011). Chung-Cheng Li, Kuochen Wang: An SLA-aware load balancing scheme for cloud datacenters. Int. Conf. Inf. Netw. 2014. 58–63 (2014). Detal, G. et al.: Multipath in the middle (box). Work. Hot Top. middleboxes Netw. Funct. virtualization. 1–6 (2013). Dixon, D., Basiliere, P.: Cloud Computing for the Sciences. Anal. June, 1–6 (2009). Engineering, I. et al.: Using the Analytic Hierarchy Process for Decision Making in Engineering Applications : Some Challenges. Int. J. Ind. Eng. Theory, Appl. Pract. 2, 1, 35–44 (1995). Gruber, C.G.: CAPEX and OPEX in Aggregation and Core Networks. 2009 Conf. Opt. Fiber Commun. - incudes post deadline Pap. 9–11 (2009). Islam, S.S. et al.: Cloud computing for future generation of computing technology. 2012 IEEE Int. Conf. Cyber Technol. Autom. Control. Intell. Syst. 129–134 (2012). Jadeja, Y., Modi, K.: Cloud computing - Concepts, architecture and challenges. 2012 Int. Conf. Comput. Electron. Electr. Technol. ICCEET 2012. 877–880 (2012). Knoll, T.M.: A combined CAPEX and OPEX cost model for LTE networks. (2014). Luke, S. et al.: Essentials of Metaheuristics A Set of Undergraduate Lecture Notes by BibTEX : (2011). Ma, F. et al.: Distributed load balancing allocation of virtual machine in cloud data center. Softw. Eng. Serv. Sci. (ICSESS), 2012 IEEE 3rd Int. Conf. 20 – 23 (2012). Popa, L. et al.: A Cost Comparison of Data Center Network Architectures. Proc. 6th Int. Conf. Co-NEXT ’10. (2010). Rubenstein, B., Faist, M.: Data Center Cold Aisle Set Point Optimization through Total Operating Cost Modeling. Sampson, J., Tullsen, D.M.: Battery Provisioning and Associated Costs for Data Center Power Capping. (2012). Sayari, Z., Harounabadi, A.: Evaluation and Making a Tradeoff between Load Balancing and Reliability in Grid Services using Formal Models. 127, 9, 5–10 (2015). Singh, A. et al.: Server-storage virtualization: Integration and load balancing in data centers. 2008 SC - Int. Conf. High Perform. Comput. Networking, Storage Anal. SC 2008. November, (2008). Soundararajan, V., Herndon, B.: Benchmarking a Virtualization Platform. 99–109 (2014). Wang, M. et al.: Analytic hierarchy process in load balancing for the multicast and unicast mixed services in LTE. 2012 IEEE Wirel. Commun. Netw. Conf. 2735–2740 (2012). Wiboonrat, M.: Life Cycle Cost Analysis of Data Center Project. (2014). Zhang, J.-D. et al.: Load Balancing Based on Group Analytic Hierarchy Process. 2013 Ninth Int. Conf. Comput. Intell. Secur. 758–762 (2013). 235 Showers Prediction by WRF Model above Complex Terrain T. Davitashvili *, N. Kutaladze **, R. Kvatadze ***, G. Mikuchadze **, Z. Modebadze *, and I. Samkharadze* * Iv. Javakhishvili Tbilisi State University/ I.Vekua Institute of Applied mathematics, Tbilisi, Georgia ** Georgian Hydro-meteorology Department, Tbilisi, Georgia *** Georgian Research and Educational Networks Association, Tbilisi, Georgia **** Iv. Javakhishvili Tbilisi State University/ Exact and Natural Sciences Faculty, Tbilisi, Georgia tedavitashvili@gmail.com Abstract - In the present article we have configured the nested grid WRF v.3.6 model for the Caucasus region, taking into consideration geographical-landscape character, topography heights, land use, soil type, temperature in deep layers, vegetation monthly distribution, albedo and others. Computations were performed using Grid system GE-01GRENA with working nodes (16 cores+, 32GB RAM on each) located at the Georgian Research and Educational Networking Association (GRENA) which had been included in the European Grid infrastructure. Therefore it was a good opportunity for running model on larger number of CPUs and storing large amount of data on the Grid storage elements. Two particulate cases of unexpected heavy showers which took place on 13th of June 2015 in Tbilisi and 20th of August 2015 in Kakheti (eastern Georgia) were studied. Simulations were performed by two set of domains with horizontal grid-point resolutions of 6.6 km and 2.2 km. The ability of the WRF model in prediction precipitations with different microphysics and convective scheme components taking into consideration complex terrain of the Georgian territory was tested. Some results of the numerical calculations performed by WRF model are presented. I. INTRODUCTION Regional climate formation above the territory of complex terrains is conditioned due to joint actions of large-scale synoptic and local atmospheric processes where the last one is basically stipulated by complex topography structure of the terrain. The territory of Caucasus and especially territory of Georgia are good examples for that. Indeed compound topographic sections (85% of the total land area of Georgia are mountains) play an impotent role for spatial-temporal distribution of meteorological fields. As known the global weather prediction models can well characterize the large scale atmospheric systems, but not enough the mesoscale processes which mainly associated with regional complex terrain and land cover. For modeling these smaller scale atmospheric processes and its characterizing features it is necessary to take into consideration the main features of the local terrain, its heterogeneous surfaces and influence of large scale atmosphere processes on the local scale processes. The Weather Research and Forecasting (WRF) 236 models are widely used by many operational services for short and medium range weather forecasting [1]. The WRF model version 3.6 represents a suitable mean for studding regional and mesoscale atmospheric processes such are: Regional Climate, Extreme Precipitations, Hails, influence of orography on mesoscale atmosphere processes, sensitivity of WRF to physics options etc. [1]. As a matter of fact the Advanced Research Weather Research and Forecasting Model (WRF-ARW) is a convenient research tool, as it offers multiple physics options and they can be combined in many ways [2]-[9]. Indeed in WRF-ARW model the main categories of physics parameterizations (microphysics, cumulus parameterizations, surface physics, planetary boundary layer physics and atmospheric radiation physics) mutually connected via the model state variables (potential temperature, moisture, wind, etc.) by their tendencies and via the surface fluxes [1]-[5]. Taking into account this broad availability of parameterizations it is not easy to define the right combination that better describes a meteorological phenomenon dominated above the investigated region. Many works have been dedicated to the problem of identification the best combination of parameterizations in WRF model that better represents the atmospheric conditions above the investigated region [2]-[26]. Three main combination of the microphysics parameterization schemes (WRF Single-Moment 3-class (WSM3) scheme[11], Eta Ferrier scheme [12], Purdue Lin scheme [13]) and 3 cumulus schemes (Kain-Fritsch [7], Betts-Miller-Janjic [14]-[15], Grell-Devenyi ensemble [16]) were chosen for identification which combination in WRF model was better simulated the atmospheric lightning conditions in the Brazil southeastern [5]. Analysis of numerical calculations in [5] had shown that sets of WSM3 with Kain-Fritsch schemes and Eta Ferrier with Betts-Miller-Janjic schemes, were better represented temperature and wind at surface, low and medium levels in the atmosphere, than Purdue Lin with Grell-Devenyi ensemble schemes. Although the sets of the WSM3 with Kain-Fritsch schemes and Eta Ferrier with Betts-Miller-Janjic schemes, have been highlighted again among all the combinations and the set WSM3 with MIPRO 2016/DC VIS Kain-Fritsch schemes was the one which presented the best results for the meteorological variables [5]. Using WRF model six different convective parameterization schemes (CP’s) were investigated for studying the impact CP’s on the quality of rainfall forecast over Tanzania for two rainy seasons [2]. The results of numerical experiments had showed that for extreme rainfall prediction the Betts-Miller-Janjic (BMJ) and Ensemble Grell-Devenyi (GD) Schemes gave better results than any other CP’s for different regions and seasons in Tanzania. Also WRF model to some extent performs better in the cases of extreme rainfall [2]. Eight microphysics schemes (Lin, WSM5, Eta, WSM6, Goddard, Thompson, WDM5, WDM6) and three DA techniques were examined in the WRF-ARW for reproduction observed strong convection and 3 heavy precipitations over the US SGP for events of 27_31 May 2001 [3]. To reproduce vertical structure and evolution of the warm-season convective events (27_31 May 2001) high temporal resolution data and cloud measurements millimetre cloud radar (MMCR) were used. From all simulation results an important deduction was that quite accurate reproductions of lower tropospheric moisture, temperature and wind profiles were necessary for the successful application, implementation of cloud-resolving regional models to simulate deep convection [3]. In [4] results of 24 h. predictions by WRF model had shown that the KainFritsch BMJ and Grell schemes were not in satisfactory quality and the simplified Arakawa cumulus parameterization scheme with Lin et al. micro-physics scheme, RUC Land surface model, Asymmetric Convective Model (ACM2), planetary boundary layer physics gave a better result than others for produce reasonable predictions in the Kelani River basin in Sri Lanka [4]. The sensitivity of quantitative precipitation forecasts to various modifications of the KF scheme and determination at which grid spacing values the KF scheme may no longer be needed on simulated precipitation was studied in [17]. By the way the KainFritsch scheme [7],[18],[19] is frequently used to improve forecasts for convective parameterization at grid spacing below 20 km, likely because it has been shown that KF scheme perform better convective parameterization than other CPSs such as the Betts-Miller-Janjic and GrellDevenyi schemes [8], [9], [10]. Also the KF scheme outperformed others for the 4 km simulation that used no convective scheme, so KF scheme can be used to improve forecasts even at such high resolutions [8],[17],[20]. Atmosphere processes and parameterization of physics for the Caucasus territory have been tested by the WRF model using the following schemes: WSM 3class simple ice scheme, RRTM scheme, Dudhia scheme, unified Noah and-surface model, Yonsei University scheme, Kain-Fritsch (new Eta) scheme and Noah landsurface model scheme [25], [26]. In this study, WRF is using for prediction heavy showers and hails for different set of physical options over the regions characterized with the complex topography. Mesoscale Convective EU Commission Project H2020 VI-SEEM №675121 . MIPRO 2016/DC VIS Systems (MCS) have been studied using real data and WRF simulations based on grid spacing in the range from 2.2 km to 19.8 km with an emphasis on 2.2 km. The ability of the WRF model in prediction precipitations with different microphysics and convective scheme components taking into consideration complex terrain of the Georgian territory have been tested. II. DATA AND METHODOLOGY A. Observational data Hydro-meteorological observations (HMO) and research in Georgia began in 1887 and in 1974 it was performed in 33 HMO stations allocated in 11 towns of Georgia. Since 1992 in the system of HMO arouse some problems due to political and economical processes developed on the territory of Georgia. Namely there was decreased a number of HMO, functioning stations were out of modern devices. At present air quality monitoring performs by National Agency of Environment and under his jurisdiction are 7 observation stations distributed in the 5 cities of Georgia: Tbilisi, Rustavi (eastern Georgia) Kutaisi, Zestafoni and Batumi (western Georgia). Each city has only 1 or 2 observation stations and only exception is capital city of Georgia Tbilisi were for the last ten-year period the observation were carried out in 8 posts, located in different districts of Tbilisi. It is obvious that these numbers of stations are not enough for assessment of hydro-atmosphere statement over the territory of Georgia. In fact we have hydrometeorological information only for separated areas were the stations are located. We have obtained and analyzed only scant information on air temperature, air humidity precipitation amount, wind (speed, direction), of 13-14 June and 20-21 August 2015 in Tbilisi. All the above data were obtained from Hydro-meteorology Department of Georgia and from the meteorological post of Tbilisi State University. Also we have analyzed radar’s information on clouds structure located in Kakheti region and these data have been used for assessment WRF model results. B. Observed convective events during 13-14 June 2015 Weather on the night of 13 to 14 June 2015 in Tbilisi was terrible with showers, thunderstorms and lights. According to the Department of Hydrometeorology of Georgia the maximal temperature on 13th of June reached 290 C. There were transfer of heat from the south by wave and it stipulated high temperature showers with thunderstorms and lights in Tbilisi. A shocking accident took place on the night of 13 to 14 June 2015 in Tbilisi. Namely late on 13 June 2015 during 1.5-2 hours there was heavy shower and following of the heavy rainfall, a landslide was released above the village of Akhaldaba, about 20 km southwest of Tbilisi. The collapsed 1 million m3 of land, mud, rocks and trees moved down from the Akhaldaba mountain into Tbilisi and dammed up the Vere river (the Vere river is flows from the Akhaldaba mountain through the territory of Tbilisi’s zoo and further discharge into the Kura river by a tunnel under square of Heroes). A big wave (constructed by mass of slush, rocks and trees) run across the Vere canyon and washed everything away until the square of Heroes. The resulting flood inflicted severe damage on the Tbilisi Zoo, Heroes' 237 Square and nearby streets and houses (see Fig.1). Unfortunately this process has been resulted in at least 20 deaths, including three zoo workers and leaving half of the Tbilisi Zoo’s animal inhabitants either dead or on the loose. the region has 15km height with maximal reflection 60dB. At 20:18 the cloud system with height 15km and maximal reflection 60dB from the territory of Akhmeta continued moving towards south-east direction and at 21:13 it reached territory of Kvareli with height 16km and maximal reflection 60dB. The cloud system continued migration and at 23:07 it shifted to the Lagodekhi territory the height and reflection of the cloud system began depletion. At 01:59 were formed a new clod system and it began moving from north-west to the north-east direction and at 02:19 it has 10km height, maximal reflection 50dB at 02:42 it achieved Akhmeta territory continued moving toward to the north-east direction and leaved investigated region. Figure 1. After flood on 14 June 2015 inTbilisi C. Observed convective events 20-21 August 2015 It was dominated western atmospheric processes above the territory of Georgia from 19 to 21 August 2015. There were developed inner massive processes above the territory of Tbilisi and it was hailed in the evening of 19th August 2015. In the evening of 20th August 2015 it was raining with thunderstorms and lights in Tbilisi. The maximal temperature on 20th of August reached 360 C and on 21st August it reached 310 C. Also on 20th of August 2015 a heavy rainfall was observed above the Kakheti region (Kakheti is famous wine-making region in eastern Georgia) of Georgia. Downpours with hail cause destruction to some regions of Kakheti and resort suburbs of Tbilisi Kojori and Kiketi, where ground floors of many houses were flooded in the evening of 20th of August 2015. Namely, caused by the violent weather the rain with hail lasted for half an hour and in some settlements of the Gurjaani, Lagodekhi and Kvareli districts broke roofs and even walls of houses. As a result, 50 percent of crops have been destroyed and many fruit trees in orchards were damaged (Source: http://eng.kavkazuzel.ru/articles/21952/). The latest largest hail storm in Kakheti was recorded in July 2012 when 100 percent of all crops were destroyed in several villages. Furthermore, hundreds of houses were left without roofs and many cattle died. Hail has always been a major issue for people in Kakheti. Every year a large portion of the agricultural sector, particularly grapes, were damaged by hail, leaving farmers with no crops. It became necessary to put into practice some implementation for solution to the hail problem. More than 80 anti-hail firing points and equipments (radars) have been installed in Kakheti for reduction the damage to crops. According to radar’s allocated in the Kakheti region we have obtained the following information concerning on cloud characteristics progression in the atmospheric column over the investigated region. Namely at 19:00 o’clock of 20th August from south-west of radar system there was outbreak of cloud systems having atmospheric front appearance (looks), which was moving towards to north-east direction, which from 19:42 began weakening. At 19:20 a new clod systems were formed and began moving from north-west to the town Akhmeta and 19:49 it achieved Akhmeta and the atmospheric column over 238 D. WRF model simulation design The Advanced Research WRF ARW model version 3.6 was used to simulate the warm-season heavy precipitation cases of the 13-14 June 2015 and 19-21 August 2015. Below some details of the numerical schemes are present. In our study we have used one-way nested domains centered on the territory of Georgia. Namely simulations were performed using a set of 2 domains with horizontal grid-point resolutions of 6.6km and 2.2 km, both defined as those currently being used for operational forecasts. The coarser domain has a grid of 94x102 points which covers the South Caucasus region, while the nested inner domain has a grid size of 70x70 points mainly territory of Georgia. Both use the 54 vertical levels including 8 levels below 2 km. A time step of 10 seconds was used for the nested domain. The WRF model contains a number of different physics options such as micro physics, cumulus parameterization physics, radiation physics, surface layer physics, land surface physics, and planetary boundary layer physics. Microphysics firmly determine water vapor, cloud, and precipitation processes [1]. There are number of microphysics such as Kessler scheme, Purdue Lin scheme, WRF Single-Moment3-class (WSM3) scheme, WSM5 scheme, WSM6 scheme, Eta Grid-scale cloud and precipitation scheme, Thompson et al. scheme, Goddard cumulus ensemble model scheme and Morrison et al Moment scheme. In our study we have chosen WSM6, Thompson, Purdue Lin, Morrison 2 Moment and Goddard schemes. Cumulus parameterization schemes are responsible for the sub-grid-scale effects of convective and/or shallow clouds and theoretically valid only for coarser grid sizes [1]. The types of cumulus parameterization schemes are Kain-Fritsch scheme, Betts – Miller - Janjic scheme, Grell – Devenyi ensemble scheme, Grell-3d ensemble scheme and Simplied Arakawa scheme. We have chosen Kain-Fritsch, Betts – Miller - Janjic and Grell – Devenyi ensemble schemes for our experiments.The planetary boundary layer (PBL) is responsible for vertical sub-grid-scale fluxes due to eddytransports in the whole atmospheric column [1]. Parameterization of the PBL directly influences on vertical wind shear, as well as precipitation evolution [21],[22]. In [23] summarized the main characteristics that explain the differences among WRF PBL schemes MIPRO 2016/DC VIS and also there was investigated how the PBL evolves within the ARW using 4-km grid spacing. There are number of PBL schemes such as Yonsei University scheme, Mellor-Yamada-Janjic scheme, MRF scheme, Asymmetric Convective Model, Quasi-Normal Scale Elimination and Mellor-Yamada Nakanishi and Niino schemes are the varieties. According to [23] we have mainly chosen Yonsei University scheme. The landsurface models use atmospheric information from the surface layer scheme, radiative forcing from the radiation scheme, and precipitation forcing from the microphysics and convective schemes, together with internal information on the land’s state variables and landsurface properties, to provide heat and moisture fluxes over land points and sea-ice points[2]. Five-layer thermal diffusion model, Noah Land Surface Model, RUC Land Surface Model and Pleim-Xiu Land Surface Model are the species of land-surface models. We have chosen Noah Land Surface Model. After considering various combinations of microphysics, Cumulus parameterization schemes, Land surface-physics and planetary boundary layer physics its combination for our experiments are given in the Table 1. 850 hPa for 13 June (21UTC) and 14 June (00UTC) 2015, respectively, which were simulated by WRF Physics Options set1 (it gave a better result than others). The calculated amounts of water vapor presented on the Fig.2 and Fig.3 (nested domain with 6.6 km resolution) at the accidental moments when atmospheric event were in full swing are not in satisfactory agreement with real situation which took place in Tbilisi and surroundings on 13 June 2015. TABLE 1. Five set of the WRF parameterizations used in this study. WRF Physics Micro physics Set1 Set 2 Set 3 Set 4 Set 5 WSM 6 Thom pson Purdu e Lin Morrison 2-Moment Goddard Cumulus Paramet erization Surface Layer Planet. Boundar y Layer LandSurface Atmosph eric Radiat. KainFritsc h MM5 Simil. YSU PBL BettsMiller Janjic MM5 Simil. YSU PBL KainFritsc h MM5 Simil YSU PBL GrellDevenyi ensemble (PX) Similarity ACM2 PBL KainFritsch Noah LSM RRT M/Du dhia Noah LSM RRT M/Du dhia Noah LSM RRT M/Du dhia Noah LSM RRTM/D udhia Noah LSM RRTM/ Dudhia III. Figure.2 Map of the relative humidity at the 850 hPa for 13 June 2015 (21UTC) simulated for the nested domain with 6.6 km resolution. MM5 Similarit YSU PBL RESULTS AND DISCUSSION Simulation of 2 atmospheric accidental events occurring during June and August 2015 over the territory of Georgia were performed by WRF-ARW model on the basis of the five different combinations of different physics options such as micro physics, cumulus parameterization physics, radiation physics, surface layer physics, land surface physics, and planetary boundary layer physics represented in the Table 1. Results of numerical calculation showed that not one of the combinations listed in the Table 1 were able to model true atmospheric event which took place on the 13th of June 2015. Namely results of numerical calculations showed that 24h predictions by these schemes were not in satisfactory quality as they were not able to account of the small-scale processes that lead to the development of deep convection. For example on the Fig.2 and Fig.3 are presented predicted fields of the relative humidity on the MIPRO 2016/DC VIS Figure.3 Map of the relative humidity at the 850 hPa for 14 June 2015 (00UTC) simulated for the nested domain with 6.6 km resolution. On the figures 4 and 5 are presented forecasted precipitation fields on the 850 hPa height for 13 June 2015 (21UTC) and for 14 June 2015 (00UTC) respectively. Both of the figures demonstrate 24 h WRFARW forecast failure and especially in the investigated region where the both nested models predicted almost dray conditions (insignificant precipitations). Namely comparison Fig.4 with Fig.5 shows that considerably increased amount of accumulated precipitations at the coastal area of the Black Sea nearby of the Poti city, but there is diffuse spectrum of accumulated precipitations on 239 the Fig.5 in comparison with Fig.4 in the investigated area. Unfortunately for this case study, all of the precipitation simulated in the region of interest (Tbilisi and suborbs) was not convective in nature, only small amount of precipitation was produced by the model. As it is known the CPSs are producing precipitation, not the microphysics [8], so this indicates that the set of choices of CPSs were not producing precipitation for this mesoscale case study. An important deduction from all simulation results was that quite accurate reproductions of lower tropospheric temperature and wind profiles but these were not necessary for the successful simulation mesoscale deep convection which took place on 13 June 2015 in Tbilisi and suburbs. In our opinion, it is necessary to strengthen initial and boundary conditions through data assimilation, and to improve the physical linkages between the radiation physics, surface layer physics, land surface physics. and mountain slopes, pronounced small-scale secondary circulations develop during the day. They lead to the development of convergence lines, e.g., over mountain crests, and mesoscale vortices influencing atmospheric boundary layer evolution as well as the transport of humidity and other tracers. Numerical calculations have shown that combination of the Purdue Lin scheme with Kain-Fritsch scheme and MM5 Similarity Surface Layer (Set 3) and Goddard scheme with Kain-Fritsch scheme and MM5 Similarity Surface Layer scheme (Set 5) gave the better results than others. Selected convective cases for Set3 and Sey5are shown in Fig.6 and Fig.7 respectively. Numerical calculations have shown that there are indeed ‘natural’ scales of activity for the convective parameterization within WRF. Fig.6. Forecasted (Set-3, 20 August 21 UTC) accumulated precipitation 12 h sum for nested domain 2.2 km resolution. Fig.4. Forecasted (13 June 00 UTC) accumulated precipitation 12 h sum simulated for the nested domain with 6.6 km resolution. Fig.5. Forecasted (14 June 00 UTC) accumulated precipitation 12 h sum for nested domain with 1-way nesting method and 6.6 km resolution. Also the WRF–ARW model, version 3.6, was used for simulation shower which took place on 20 August with 5 different combinations of physical schemes (see Table.1). Numerical calculations have shown that in all cases, orographic forcing plays an important role in the localization and intensification of precipitation in and nearby of complex terrain. In the pre-convective environment, caused by differential heating of valleys 240 Fig.7. Forecasted (Set-5, 20 August 21 UTC) accumulated precipitation 12 h sum for nested domain 2.2 km resolution. Comparison Fig.6 with Fig.7 shows that the main features of accumulated precipitations are predicted almost similarly, but accurate study of the dynamics and its comparison with the data of observations have shown that Set 3 was able to model that true atmospheric event which took place on the 20 -21 August 2015. In summary it can be said, that above mentioned model can be MIPRO 2016/DC VIS successfully used for local weather extremes prediction for western type synoptic processes such was 19-21 August atmospheric circulation above the Georgian territory. [7] [8] [9] IV. CONLUSION In this study some comparisons between WRF forecasts was done in order to check the consistency and quality of WRF model for the day with the heavy precipitations occur on the territory of Georgia. This first analysis allowed verifying that in general the set of combinations of Purdue Lin scheme with Kain-Fritsch scheme and MM5 Similarity Surface Layer (Set 3) and Goddard scheme with Kain-Fritsch scheme and MM5 Similarity Surface Layer scheme (Set 5) gave the better results than others for western atmosphere processes dominated above the territory of Georgia. Also for evolution and improvement of model skill for different time and spatial scale the verification and assimilation methods should be used for further tuning and fitting of model to local conditions. ACKNOWLEDGMENT The authors are supported by the EU Commission Project H2020 “VRE for regional Interdisciplinary communities in Southeast Europe and the Eastern Mediterranean” VI-SEEM №675121 [10] [11] [12] [13] [14] [15] [16] [17] [18] REFERENCES [1] [2] [3] [4] [5] [6] W. C.Skamarock, J. B. Klemp, J.Dudhia, D. O. Gill, D. M. Barker, W. Wang, J.G. Powers, “A description of the advanced research WRF Version 2. NCAR Tech. Notes. Natl. Cent. for Atmos. Res., Boulder, Colorado. 2005 A. L. Kondowe, “Impact of Convective Parameterization Schemes on the Quality of Rainfall Forecast over Tanzania Using WRFModel”, Natural Science, 2014, Vol.6, pp.691-699 Z. T. Segele, Lance M. Leslie and Peter J. Lamb, “Weather Research and Forecasting Model simulations of extended warmseason heavy precipitation episode over the US Southern Great Plains: data assimilation and microphysics sensitivity experiments”, Tellus A 2013, V. 65, pp.1-26. G.T. De Silva, S. Herath, S.B.Weerakoon and Rathnayake U.R., “Application of WRF with different cumulus parameterization schemes for precipitation forecasting in a tropical river basin”, Proceedings of the 13th Asian Congress of Fluid Mechanics 17-21 December 2010, Dhaka, Bangladesh, pp.513-516 G. S. Zepka and Jr. O. Pinto, “A Method to Identify the Better WRF Parameterizations Set to Describe Lightning Occurrence”, 3rd meteorological Lightning Conference, 21-22 April, 2010, Orlando, Florida, USA pp.1-10 Y. Yair, B. Lynn, C. Price, V. Kotroni, K. Lagouvardos,; E. Morin, A. Mugnai and Llasat M. d. C., “Predicting the potential for lightning activity in Mediterranean storms based on the Weather Research and Forecasting (WRF) model dynamic and microphysical fields”. J. Geophys. Res., 115, D04205, doi: 10.1029/2008JD010868. MIPRO 2016/DC VIS [19] [20] [21] [22] [23] [24] [25] [26] J. S. Kain, “The Kain–Fritsch Convective Parameterization” An Update. J. Appl. Meteor., 2004, 43, 170–181. E. K. Gilliland, , and C. M. Rowe, “A comparison of cumulus parameterization schemesin the WRF model” . Preprints, 21st Conf. on Hydrology, San Antonio, TX, Amer.Meteor. Soc., 2007, P2.16. W. Wang and N. L. Seaman, “A comparison study of convective parameterization schemes in a mesoscale model”. Mon. Wea. Rev., 1997, 125, 252-278. L.-M. Ma and Z.-M. Tan, 2009: Improving the behavior of the cumulus parameterization for tropical cyclone prediction: convection trigger. Atmos. Res., 92, 190-211. S.Y. Hong, J. Dudhia, and S.H. Chen, “A Revised Approach to Ice Microphysical Processes for the Bulk Parameterization of Clouds and Precipitation”. Mon. Wea. Rev., 2004, 132, 103–120. B.S. Ferrier, Y. Jin, Y. Lin, T Black, E. Rogers, G. DiMego, “Implementation of a new grid-scale cloud and precipitation scheme in the NCEP Eta model”. Preprints 15th Conf. on Numerical Weather Prediction,San Antonio, TX, Amer. Meteor. Soc., 2002, 280–283. Y. L. Lin, R. D.Farley, H. D. Orville,“Bulk Parameterization of the Snow Field in a Cloud Model”. J. Appl. Meteor., 1983, 22, 1065–1092. Z.I.Janjić, “The Step-Mountain Eta Coordinate Model: Further Developments of the Convection, Viscous Sublayer, and Turbulence Closure Schemes. Mon. Wea. Rev., 1994, 122, 927– 945. Z.I. Janjić, ”Comments on “Development and Evaluation of a Convection Scheme for Use in Climate Models”. J. Atmos. Sci., 2000, 57, 3686. G. A. Grell and Dévényi, D.“A generalized approach to parameterizing convection combining ensemble and data assimilation techniques”. Geophys. Res. Lett.,2002, 29 (14), 1693, doi: 10.1029/2002GL015311. J. D. Duda, “WRF simulations of mesoscale convective systems at convection-allowing resolutions Graduate Theses and Dissertations. , 2011. Paper 10272. J. S. Kain and J. M. Fritsch, “The role of the convective “trigger function” in numerical forecasts of mesoscale convective systems”. Meteor. and Atmos. Phys.,1992, 49, 93–106. J. S Kain,., and J. M. Fritsch, “Convective parameterization for mesoscale models: The Kain–Fritsch scheme”. The Representation of Cumulus Convection in Numerical Models, Meteor.Monogr., No. 24, Amer. Meteor. Soc.,1993, 165–170. S.Yavinchan, , R. H. B. Exell, and D. Sukawat, 2011: Convective parameterization in a model for the prediction of heavy rain in southern Thailand. J. Meteor. Soc. Japan, 89A,pp. 201–224. A. S.Monin, A. M. Obukhov,” Basic laws of turbulent mixing in the surface layer of the atmosphere “Tr. Geofiz. Inst., Akad. Nauk SSSR, 24, 1954, pp.1963–1967. (in Russian). S.Y. Hong, and J. O. J. Lim, “The WRF single-moment 6-class microphysics scheme (WSM6)”. J. Korean Meteor. Soc., vol. 42, 2006, pp.129–151. A.E.Cohen, S.M.Cavallo, M.C.Coniglo, H.E. Brooks, “AReview of Planetary Boundary Layer Parameterization Schemes and Their Sensitivity in Simulating Southeastern U.S. Cold Season Severe Weather Environments”, vol.30, 2015, pp.591-612 I., W. A. Jankov, M.Gallus, Segal, B. Shaw, and S. E. Koch, 2005: The impact of different WRF Model physical parameterizations and their interactions on warm season MCS rainfall. Wea. Forecasting, 20, 1048–1060, doi:10.1175/WAF888.1. T. Davitashvili, R. Kvatadze, N. Kutaladze, “Weather Prediction Over Caucasus Region Using WRF-ARW Model, MIPRO, 2011, Proceedings of the 34th International Convection, 2011, Print ISBN: 978-1-4577-0996-8, Opatija, Croatia, pp.326-330 T. Davitashvili, G. Kobiashvili, R. Kvatadze, N. Kutaladze, G. Mikuchadze, “WRF-ARW Application for Georgia” Report of SEE-GRID-SCI User Forum, 2009, Istanbul, Turkey, pp.7-10 241 Methods and Tools to Increase Fault Tolerance of High-Performance Computing Systems I.A. Sidorov Matrosov Institute for System Dynamics and Control Theory of SB RAS, Irkutsk, Russia ivan.sidorov@icc.ru Abstract - This work focuses on the development of models, methods, and tools to increase a fault tolerance of highperformance computing systems. The described models and methods are based on automatic diagnostics of the basic software and hardware components of these systems, the use of automatic localization, correction of faults, and the use of automatic HPC-system reconfiguration mechanisms. The originality and novelty of the offered approach consist of creating the multi-agent system with universal software agents, capable of collecting node state data for analysis and thereby enabling the agent to make the make necessary decisions directly. I. INTRODUCTION The solution of resource-intensive scientific problems using high-performance computing (HPC) systems is complicated owing to a number of difficulties faced by the applied specialist. Examples of such difficulties include requirements to optimize parallel programs (PP) for the architecture of the HPC-system; complexity in the decomposition of the initial data; necessity to select a specific queue type from the list of the distributed resource manager; requirements with respect to the limitations of administrative policies of the HPC-system (priorities for tasks, limitations in the number of tasks in the queue, limitations in the maximum execution time, etc.), and other conditions required for the preparation and execution of the PP. One of the main problems of largescale computational experiments that require a large number of simultaneously involved computing resources is the task of providing fault tolerance to the computing process execution. Because PP execution failure directly affects the economic performance, fee for the computing resources, fee for the disruption of experiment deadlines, and fee for labor resources spent searching for problems and failures, etc. The reasons for the PP execution failures can be classified into the following categories: mistakes made by the developer in the PP code; mistakes made by the user when preparing the initial data, and failures in the hardware and software components of the HPC-system. The first two categories of failures in some cases can be identified and eliminated during the debugging process on a small number of nodes. The subsequent launch of large-scale computational experiments on a large number of nodes often leads to failures due to the technical problems of computational nodes. The failure of hardware or software components on one node (processor, memory, hard drives, network 242 devices, system services, etc.) will lead to the total failure of the PP on all involved nodes. Checkpoint mechanism can improve the reliability of the calculations, however, when the program executes on a large number of computing nodes, effective load significantly reduces. This occurs because the PP spends most of the CPU time for organization of a checkpoint and restoration after a failure [1, 2]. In this regard, it is the actual task of creating and applying effective methods and tools that will increase the fault tolerance of HPC-systems, and in turn, will reduce the number of failures of the PP executing on these systems [3]. One approach to solve this problem is the use of monitoring and diagnostics systems. Known monitoring systems (Ganglia [4], Nagios, Zabbix) focused primarily on the collection of data and providing aggregated information to the operator about status of the HPC-system components. More attention is given to the tools monitoring the PP execution to collect comprehensive data about effectiveness of the use of computing resources (Lapta [5], mpiP, IPM). There are a number of commercial tools for the control of a HPCsystem engineering infrastructure (ClustrX Watch, EMC ViRP SRM, HP OpenView [6]) and open-source tools for control of a HPC-system computing infrastructure (Iaso [7] and Octorun [8]). However, the level of automation at all stages of monitoring, diagnostics, and troubleshooting is unacceptably low and does not support the required level of HPC-systems reliability. Moreover, the big problem for the existing approaches in creating monitoring and diagnostics systems for large-scale HPC-systems (and in the future for exascale systems) is their obvious hierarchical architecture. In accordance with this architecture, the clients of the monitoring system send data to the central (or intermediate) nodes, where the data is collected and processed, and if necessary, control actions are generated. The hierarchical architecture imposes significant limitations on the scalability of the monitoring system, leading to an overload on the network components, data storage systems, and as result, late reaction to the important and critical events. This paper proposes a model, method, and tools to increase the fault tolerance of HPC-systems with automatic monitoring and intelligent diagnostics of software and hardware components. Originality and novelty of the offered approach consists in creating the multi-agent system with universal software agents, capable of collecting node state data for analysis and decision-making directly on the agent side. MIPRO 2016/DC VIS II. MODEL AND METHODS OF THE COMPUTING NODE DIAGNOSTICS The model of computing node diagnostics can be represented by the following structure: S  O, Z ,T , C, F , R, P, Q, L, I   localize and troubleshoot the identified faults. Controldiagnostic operations implement mappings: F : O  Z  Z or F : O  Z . Elements of R include the following controldiagnostic operations types:  r1 – data collecting operations (obtain data about the current state of active cooling systems, the temperature of the motherboard and processors, the hard-drive SMART info, the amount of free memory on storage devices, the status of network devices, the status of operating system services, the status of connections to the network drives, etc.);  r2 – test operations (fast or full memory tests, testing of the hard drive devices for badblocks, testing of the network components, etc.);  r3 – fault localization operations (identification number of the failed RAM bank; identification of processes using a lot of memory or generating a huge workload for hard-disk drives or network components; identification of the physical location of the failed hard drive; identification of failed cooler number on the motherboard etc.);  r4 – troubleshooting operations (cleaning temporary files from storage devices, restarting services of operating system services, remounting network drives, etc.);  r5 – troubleshooting verification operations (check the free disc space, check the free amount of RAM, check stopped processes of the operating system, check temperature of the CPU, etc.);  r6 – critical situations operations (node shutdown, disconnecting node from the HPCsystem resource manager, killing all calculating processes to decrease CPU-temperature, sending a notification to the system administrator, etc.). where:  O – is the node components set;  Z – is the measured characteristics set for each node component;  T – is the measured characteristics types set;  C – is the predicates set;  F – is the control-diagnostics operations set;  R – is the control-diagnostics operations types set;  P – is the production rules set;  Q – is the queue of the control-diagnostics operations that are ready for execution;  L – is the log of the diagnostics process, and  I – represents the intervals diagnostics process on the node. for starting Elements of the set O are different components of a HPC-system node, failure of which may result in failure of the PP, because of the instances of PP being executed on this node. The elements of O can be hardware components (active cooling system, processor, memory banks, hard disk drives, network devices, etc.) and software components (operating system services and processes, operating system network interfaces, connections to network storages, etc.). Characteristics of Z are divided into the following categories:  workload computing characteristics of node components (workload of CPU, CPU cores, memory, network and hard-drives input/output workload , etc.); The type of operation defines the priority of its execution in the operations queue Q : r1 – priority 1, r2 – priority 2, r3 – priority 3, r4 – priority 4, r5 – priority 5, r6 – priority 0 (maximal).  characteristics of physical state of node components (temperature of CPU and motherboard, uninterruptible power supply state, hard drives state, etc.); r3 , and r5 are classified as data obtain operations, which  characteristics of executed program state (priority, CPU-usage time, memory usage, hard-drive, and network storages utilization, etc). Elements of T include following value types of measured characteristics: logical (Boolean); integer; float; percent (unsigned integer in interval [0…100]); chars; string (up to 256 chars); text (up to 220 chars). Elements of F include operations designed to perform control actions to collect data of the hardware and software component states and diagnostic actions to MIPRO 2016/DC VIS It should be noted that the operations of types r1 , r2 , collect information about the state of the node components without making any changes. Operations r4 and r6 are designed to perform automatic control actions aimed at troubleshooting problems in software and hardware components. Control actions can change the mode of the node components in order to provide fault-tolerant functioning of this node in the HPC-system. In case of failure to resolve a critical situation triggered, control actions should automatically withdraw the defective node from the pool of available resources of the HPC-system. 243 Analysis of the control-diagnostic characteristics from the set Z is performed by predicates from the set C , written as follows: ci : z j  lgical operator  const ,  where ci  C , z j  Z , i  1, nC , j  1, nZ . On the left side is specified characteristic of the set Z , is the value to be analyzed. Among the logical operators that can be used are , , , ,  , and !  . The value type of the const on the right side and the characteristic value z k should be of the same type. It allows the construction of complex expressions using operators AN D, OR, XOR , and NOT . The set P contains the production rules of the form IF ck THEN f i . This structure is interpreted as follows: if the predicate ck is true, it is necessary to perform the operation f i . Production is ready when the values for all characteristics included in the predicate are defined. Production is completed if the predicate is true and the operation on the right side has been added to the queue. The process of diagnostics of the computing node includes the following steps. With some interval from I , the monitoring agents that are working on the nodes run the diagnostic procedure. The initial data of the diagnostics is a vector containing values of the logical characteristics from Z , which depend on the current interval from I , which define modes of the diagnostic process (for ex. use or not troubleshooting operations) and the quality of the diagnostic process (quick testing, full testing or advanced testing). There is a certain order of interpretation of productions and the addition of operations in the queue. It was stated earlier that there are 6 defined operations types in the model. In the first step, ready productions are chosen where the right side has the control-diagnostic operations of types r1 , r2 or r3 (data obtain operations). Interpretation of selected productions was conducted, which included the calculation of predicates and in case their value was true, the operations were added in the queue Q . Processing operations in the queue Q are performed in the following mode. One of the requirements of the diagnostics tools is to reduce the total execution time of operations on the computing node. To achieve this, a strategy was selected for the execution of the maximum number of operations in unit time. According to this strategy, all control-diagnostic operations added to the queue Q executed immediately in a separate thread in parallel mode with other operations. At the same time, the only condition to start operation in the queue is checking to ensure that the same component from O does not perform other operations. This condition is checked in order to avoid execution of mutually contradictory operations. 244 After completion of an operation, the queue Q expands the set of defined characteristics values, and as a result, expands the list of ready productions. The procedure for interpretation of ready and not previously completed productions starts again. It should be noted that the second and the subsequent steps of the productions interpretation, in addition to the productions with specified operations of types r1 , r2 and r3 , also selected productions of type r6 at the right side (actions in critical situations). In the case the predicate is true for such production, this operation is placed at the first position in the queue. This approach ensures immediate response to critical situations. After all the operations of types r1 , r2 , and r3 are completed, the next stage is executed. At this stage, the selected productions on the right are the operations of r4 type (troubleshooting operation). After completion of all r4 type operations, the next stage is executed with selected productions of r5 type operations at the right side (troubleshooting verification operations). At the last stage, when type r6 operations are performed and when all other operations are completed, the diagnostics process is considered complete. It should be noted that all elements of sets O , Z , C , F , P , I for the structure S are provided with informative descriptions. During the interpretation of the predicates and operations to the diagnostic log file L sequentially added the actions taken. According to this principle it is performed the construction of the diagnostics log file. III. EXAMPLE OF NODE DIAGNOSTICS As an example, consider a simple diagnostics process of computing node RAM. Let the elements of the sets for the structure S be as follows: O: Z: C: o1 – RAM of the computing node; o2 – user tasks that executed on the node. z0 – initial condition of diagnostic (type is string); z1 – total RAM size (type is integer); z2 – used RAM size (type is percent); z3 – identification number of failed RAM bank (type is integer); z4 – status of sending notification to the administrator of HPC-system (type is logical); z5 – status of sending notification to the user of task o2 (type is logical); z6 – status of command for stopping task execution o2 (type logical); z7 – status of finishing task o2 (type logical). c0: z0 == "quick" - performs quick diagnostics; c1: z1 != 233 - total size of RAM should be 8 Gb; c2: z3 > 0 - number of failed RAM bank was successfully indentified; c3: z2 > 99 - node RAM used more than 99 %. MIPRO 2016/DC VIS F: P: f1: {o1} → {z1,z2} - operation to obtain node RAM information (type is r1); f2: {o1} → {z3} - operation to localize a failed RAM bank (type is r3); f3: {o1, z3}→{z4} - operation to notify HPC-system administrator about failed RAM bank (type is r6); f4: {o2, z2} → {z5} - operation to notify the user of the task o2 about exceeding the limit of RAM (type is r6); f5: {o2} → {z6} - stop task execution (type is r4); f6: {o1, o2} → {z7} - check the status of stopping program execution o2 (type is r5). p0: IF с0 THEN f1 - perform quick node diagnostics; p1: IF с1 THEN f2 - RAM size was changed, need to localize a faulty RAM bank; p2: IF с2 THEN f3 - notify the administrator about identified failed RAM bank; p3: IF с3 THEN f4 - notify user of task o2 about exceeding the limit of the RAM; p4: IF с3 THEN f5 - stop task o2, because the limit of the RAM was exceeded. The diagnostics process for such a structure may work as follows. Let z0="quick". Operations queue after interpretation of the ready productions will include the following operation: Q={ f1 }. When the operation f1 completes, the values of parameters z1 и z2 will be defined. Suppose that a node has two 4GB RAM banks, and one bank has failed, then z1 = 232 (in such cases, the operating system often continues to work, excluding the address space of the failed bank). Also, assume that z2 > 99 (critical memory usage reached with the possibility of a swap usage). In the next step, the predicates c1 and c3 are true. Operations of localization Q={ f4 , f2 } were added to the queue in accordance with the order of productions processing. In the next step, the predicates c2 and c3 are true and the following operations, Q={ f3 , f5 }, will be added to the queue. And in the last step, the operation Q={ f6 }, for checking process stopping, will be added to the queue. It should be noted that this example is demonstrative of the above models and methods and does not contain a complete list of operations and productions for a comprehensive diagnostic of computing node RAM. IV. REALIZATION OF DIAGNOSTICS TOOLS Software implementation of the diagnostics tools is realized as a part of the meta-monitoring toolkit [9, 10] designed for heterogeneous large-scale HPC-systems, which includes hundreds of thousands of calculating nodes. This toolkit is based on the use of service-oriented technology, multi-agent technology, methods of creation of expert systems, methods of decentralized data processing, distributed data storage, and decentralized decision-making. The above model and methods were implemented as part of the autonomous software agents of the meta- MIPRO 2016/DC VIS monitoring system that functions on the compute nodes in the background mode. Agents perform the functions of node state data collection, collected data analyzing, generating and executing the control actions, and communications with other agents. Ability to analyze data and to make the necessary decisions on the node side is the key feature of the implemented approach. As discussed above, in most monitoring systems, the collected data are always sent to the central node where they are processed and stored. This implementation significantly reduces the number of operations to analyze data on the node side; however, regular sending of data to the master node creates an additional load on the network protocols stack, which also requires CPU time (formation, analysis and control of network packets). In our approach, the analysis of the data at the node side allows sending data to the central node only if necessary. Comparative analysis of the Ganglia monitoring client (gmond) and agent of the developing system showed that for the same frequency of the data gathering (every 30 sec.), our agent with the functions of local data analysis and local decision-making consumes 37% less CPU time. Figure 1. The architecture of the base subsystems of the diagnostics tools Diagnostics tools subsystems (Fig. 1): include the following base  decentralized data storage, which includes knowledge base (contains specifications for characteristics, productions, predicates, list of previously detected problems and faults etc.), control-diagnostics operations library (contains modules realized using the specialized language), round-robin database to store periodically obtained monitoring data, and log files;  control subsystem, which performs functions of coordinating node diagnostics and includes mechanisms to start the diagnostics process in certain time intervals, or in moments when computing node is idle; 245  subsystem for interpretation of productions and predicates;  the control-diagnostics manager; operations  execution subsystem operations, and control-diagnostics  the logging subsystem. for nodes, and HPC-system reconfiguration ensured the prevention of many computational processes failures. queue Program realization of the agent and all base subsystems was implemented using the C++ programming language. Collection of information about node state is implemented using SIGAR [11] library. Implementation of control-diagnostics operations is performed using a specialized language developed by the author. This language is a subset of ECMA Script [12], which was extended to support calls of external commands, output stream processing and a number of other mechanisms, which allow obtaining data about the state of non-standard software and hardware devices. Interpreter for productions and predicates are based on the ECMA Script interpreter too. All the specifications of the diagnostics tools are described in the JSON format. To store and have access to collected data, was implemented a solution based on round-robin database principles. The developed solution showed a higher performance for our tasks in comparison with the universal tools (RRDtools and MRTG [13]). For storage of productions, predicates, list of previously detected problems and other information, the light-weight embeddable relational database SQLite [14] is used. VI. The described approach allows to control, diagnose, and troubleshoot software and hardware components of HPC-system nodes in a finite number of steps. It ensures minimization of run-time diagnostics and troubleshooting processes by using the mechanisms of parallel operations execution, and the approach also supports an increase in the fault tolerance of nodes thereby increasing the fault tolerance of PP executed on these nodes. All these features make it possible to enhance the fault tolerance of the HPC-system. ACKNOWLEDGMENT The study was supported by Russian Foundation of Basic Research, projects no. 15-29-07955-ofi_m and no. 16-07-00931. The author is grateful to A. G. Feoktistov for discussions and valuable comments. REFERENCES [1] [2] [3] [4] V. EXPEREMENTAL RESULTS The developed methods and tools of diagnostics were successfully approved in the Supercomputer Center of ISDCT SB RAS [15]. Configuration of heterogeneous HPC-system includes the following resources: [5]  110 nodes with AMD Opteron 6276 processors (total number of cores 3520); [6]  20 nodes with Intel Xeon 5345 processors (total number of cores 160);  2 nodes with GPU Nvidia Tesla C1060 accelerators (total number of cores 1920). During diagnostics tools testing, several software and hardware resources in critical condition and in an error state were detected:  [node011.mat.icc.ru]: RAM bank #4 failed;  [node-12.bf.icc.ru]: CPU temp is more 87°C;  [node-04.bf.icc.ru]: SMART health status bad;  [node103.mat.icc.ru]: /store used 100%;  [node087.mat.icc.ru]: iface ib0 errors detected;  [tesla01.icc.ru]: CPU warnings in system log file. The automatic testing of the nodes components, fault localization and troubleshooting, isolation of unreliable 246 CONLUSION [7] [8] [9] [10] [11] [12] [13] [14] [15] F. Cappello, A. Geist, W. Gropp, S. Kale, B. Kramer, M. Snir, “Toward exascale resilience: 2014 update,” Supercomputing Frontiers and Innovations, vol. 1, nо. 1, pp. 5-28, April 2014. P. Kogge, J. Shalf “Exascale Computing Trends: Adjusting to the "New Normal" for Computer Architecture,” Computing in Science & Engineering, vol. 15, no. 6, pp. 16-26, November 2013. B. Mohr “Scalable parallel performance measurement and analysis tools – state-of-the-art and future challenges,” Supercomputing frontiers and innovations, vol. 1, no. 2, pp. 108-123. July 2014. M. Massie, B. Li, V. Nicholes, V. Vuksan, Monitoring with Ganglia Tracking Dynamic Host and Application Metrics at Scale, O’Reilly Media, November 2012. A. Adinets, P. Bryzgalov, V. Voevodin, S. Zhumatii, D. Nikitenko, K. Stefanov “Job Digest: an approach to dynamic analysis of job characteristics on supercomputers,” Numerical methods and programming: Advanced Computing. vol. 13, no. 4, pp. 160–166, 2012 (in Russian). “HP OpenView,” http://www.openview.hp.com/solutions/ams/ ams_bb.pdf [online, accessed: 31-Jan-2016]. K. Lu, X. Wang, G. Li, R. Wang “Iaso: an autonomous faulttolerant management system for supercomputers,” Frontiers of Computer Science, vol 6, no. 3, pp. 378-390, May 2014. A. Antonov, V. Voevodin, A. Daugel-Dauge, S. Zhumatii, D. Nokonenko “Ensuring effective operational control and battery life of the MSU Supercomputer,” Bulletin of South Ural State University, vol. 4, no. 2, pp. 33-43, 2015 (in Russian). I. V. Bychkov, G. A. Oparin, A. P. Novopashin, I. A. Sidorov “Agent-Based Approach to Monitoring and Control of Distributed Computing Environment,” Lecture Notes in Computer Science, vol. 9251, pp. 253-257, September 2015. I. A. Sidorov, A. P. Novopashin, G. A. Oparin, V. V. Skorov “Methods and tools of meta-monitoring of distributed computing envinronments,” Bulletin of South Ural State University, vol. 3, no. 2, pp. 30-42, 2014 (in Russian). “System Information Gatherer and Reporter API,” https://github.com/hyperic/sigar, [online, accessed: 31-Jan-2016]. “Standard ECMA-262,” http://www.ecma-international.org/publications/standards/Ecma-262.htm [online, accessed: 31-Jan-2016]. H. Allen, P. Regnauld “Network management workshop: MRTG/ RRDTool,” Apnic 29, Kuala Lumpur, March 2010. G. Allen, M. Owens, The Definitive Guide to SQLite, Apress, November 2010. “Irkutsk Supercomputer Center of SB RAS,” http://hpc.icc.ru/ [online, accessed: 31-Jan-2016]. MIPRO 2016/DC VIS Logical-Probabilistic Analysis of Distributed Computing Reliability A.G. Feoktistov and I.A. Sidorov Matrosov Institute for System Dynamics and Control Theory of SB RAS, Irkutsk, Russia agf@icc.ru Abstract - The aim of the study is to develop tools of increasing a problem solving reliability in a heterogeneous distributed computing environment by applying a diagnostics of computing resources components and using an analysis of problem solving schemes. A scheme (a plan) is an abstract program for a problem solving. A special attention is paid to the calculation of a problem solving scheme reliability on the basis of a logical-probabilistic method. This method is based on transiting from Boolean functions for a reliability description of a problem solving scheme to probability functions for determining indicators of such reliability. Improving a problem solving scheme reliability is carried out by a resource reservation. The resource reservation applied in a problem solving scheme provides obtaining of a reliability indicator that approximates maximally the predetermined criterion of reliability, taking into account limitations on a number of reserve resources. The example of the problem solving scheme and calculating its reliability is represented. I. INTRODUCTION Today, special attention of specialists in the field of cloud infrastructures and Grid-systems is given to developing fundamentals of distributed computing for solving large-scale scientific problems in various subject areas. The development of new methods and algorithms for a computing management in a heterogeneous distributed computing environment (DCE), where nodes have a complex hybrid structure, is actively carried out. The heterogeneity of a DCE, the presence of hybrid components in it structure, the wide range of fundamental and applied problems and the necessity of scalable computing actualize the researches related to improving a reliability of computational processes. In theory of computer systems reliability there exist two traditional approaches to provide a reliability of computational processes [1]. The first approach aims to make an attempt to restore a computational process after some fault of software or hardware components of a computer system. The second approach aims to provide an operability (fault tolerance) of a computer system while fault of software or hardware components. The first approach is typically realized by using checkpoint mechanisms. Unfortunately, these mechanisms don't allow to correctly restore a computational process after a fault for some classes of parallel programs. In addition, a total time of checkpointing and restarting of a computational process can be comparable to a time of solving a problem in a heterogeneous DCE [2]. MIPRO 2016/DC VIS The second approach is based on using of a redundancy of hardware, processes and data. Within this approach there exist problems with switching from a faulty component to a reserved component [3, 4]. A promising approach to improve a computer system reliability [5, 6] is proactive detecting and troubleshooting of current and potential hardware faults and using this knowledge for planning computational processes and allocating computing resources for an execution of these processes. However, the implementation of this approach for a heterogeneous DCE raises some important problems. Among them: the lack of universal models and methods of determining reliability indicators for a heterogeneous DCE; the lack of methods and tools for applying reliability indicators in the traditional resource managers; the insufficient automatization of troubleshooting in traditional monitoring systems for DCE. In this paper some aspects of an approach to ensure a reliability of solving problems in a heterogeneous DCE are considered. This approach is developed to the DCE, which is characterized by following features:  Computer clusters act as nodes of the DCE. These clusters can consist of hardware and software components (computational elements) of various architecture and configuration for different parallel programming technologies. Clusters are organized based on both allocated and nonallocated computational elements and, hence, significantly differ by the reliability degree of their computing resources.  Different levels of the DCE have various categories of users working at them. Clusters are used by both local users of these clusters and global users of the DCE.  To solve a problem, a user should form a job for the DCE. The job is a specification of solving problem that include information on the required computing resources, executable applied programs, input/output data and other required information.  The specialized multi-agent system (MAS) for a distributed computing management [7] is used for formulating problems, planning problem solving schemes and forming jobs of global users. Formed jobs are decomposed by the MAS into a set of subjobs for clusters. These subjobs and jobs of 247 local users are managed by traditional local resource managers at the cluster levels. Agents of the MAS observe for subjobs execution processes.  Clusters do not have enough free resources to simultaneously process all jobs in its queues. The features of our approach are listed below. A high competition of users for common resources of the DCE leads to the necessity to take into account all characteristics of these resources in a process of their allocation in order to achieve the required quality of job execution [8]. Usually, a time or cost of job execution, indicators of resources load balancing, coefficients of resources efficiency or performance are used as main criteria of the efficiency for a distributed computing management. Today, besides the listed criteria, the reliability of job execution in the DCE is the practical importance criterion. Increasing the distributed computing reliability allows guaranteeing the fulfillment of other criteria of job execution in a greater degree. The calculation of a problem solving scheme reliability is carried out on the base of a logicalprobabilistic method [9]. This method provides transiting from Boolean functions for a reliability description of a system under study to probability functions for determining indicators of such reliability. A reservation is one of the simplest and effective methods to improve computer system reliability [10]. In this case, the reservation is to use additional (reserve) nodes. These additional nodes are expected to assume the faulty nodes functions in the execution process of a problem solving scheme. Additional nodes are loaded reserve. These reserve nodes are also used as main nodes. Using loaded reserve due to the fact that other forms of a reservation [11] such as an unloaded reserve, a hot reserve and a multi-version job realization lead to significant overhead and are more suitable for real time systems. The reservation is carried out for single elements (nodes). A large number of nodes may be required for such reservation. Thence, in this paper the problem of forming a set of reserve nodes providing of a reliability indicator that approximates maximally the predetermined criterion of reliability, taking into account limitations on a number of reserve resources. Data about the nodes reliability are provided by the author's meta-monitoring system considered below. The successful results of using the logical-probabilistic method for studying distributed systems are known [12, 13]. A practical comparison of similar methods is discussed in [14]. II. META-MONITORING SYSTEM In the considered DCE the meta-monitoring system is used to test the DCE nodes, to collect data about current state of the DCE nodes, to detect nodes faults, to diagnose and to partially repair these faults. In this system the processes of collecting data, monitoring nodes, detecting and diagnosing faults are based on using multi-agent technologies and the unique methods for decentralized 248 processing and distributed storage of data. Unlike the known, the meta-monitoring system provides:  an ability to obtain data from the most popular high-performance monitoring systems (Ganglia, Nagios, Zabbix etc.);  a wide range of traditional and original functions for obtaining and collecting data about hardware and software components of the DCE nodes;  the high-level toolkit for the implementation of the original functions for obtaining and collecting data as modules in various programming languages;  the specialized tools for collecting and analyzing data of the engineering infrastructure of computer clusters and data centers;  the auxiliary tools for a unification and aggregation of data received from different sources;  the new intelligent tools for an automated expert analysis of data and generation of control actions for changing nodes states;  the new service tools for periodic testing nodes, detecting, diagnosing and partial repairing nodes faults;  the special application programming interface based on open standards for providing access to the monitoring data for external software systems. An accumulation of data about faults is carried out in both automatic and manual modes. The registration of faults in an automatic mode is performed by means the meta-monitoring tools listed above. Similar registration of faults in a manual mode allows the DCE administrator to describe hardware and software faults that were not recognized in an automatic mode. The basic information about a fault is the following information:  the object (computing node, storage system, facility of an engineering infrastructure for a computer cluster, etc.) containing a faulty component;  the faulty component (hard drive, bank of RAM, network adapter, etc.);  the degree of a component fault impact: without consequences for the object, a fault of a computational process in the object, a critical fault and, as result, non-operability of the object;  the time of a fault detection;  the time of troubleshooting;  the result of troubleshooting: a complete correction of the fault, a partial correction of the fault and further using the object in the DCE, excluding the object from the DCE configuration and reconfiguring the DCE. MIPRO 2016/DC VIS An estimation of reliability indicators of the DCE objects is based on accumulated data about the object component faults. The meta-monitoring system, mentioned above, is described in details in [15]. III. LOGIC-PROBABALISTICS ANALYSIS A. Conceptual Model Denote by the symbols F, Z, N and A the sets of program modules, modules parameters, computing nodes and agents correspondently. The relations between these sets are denoted by the symbols Rin , Rout , Raf and Ran such that Rin  Z  F , Rout  Z  F , Raf  A  F and Ran  A  N . Then, a conceptual model of the DCE can be described by the structure M  F , Z , N , A, Rin , Rout, Raf , Ran  . The relations Rin и Rout define respectively the sets of input and output parameters for modules and, thus, determine the information-logical connections between modules. The relation Raf determines connections between agents and modules that can be performed by agents. A type of the relations Rin , Rout and Raf is the “many-to-many”. The relation Ran determines agents’ nodes. A type of the relation Ran is the “one-to-many”. Nodes of various clusters have different degrees of the reliability. Computer clusters are represented by agents in the management system of the DCE. The user’s request for computational services of the DCE is defined as a problem in the nonprocedural form for the management system: “calculate the values for parameters of the subset Z out  Z knowing the values for parameters of the subset Z in  Z ”. In general, a set of schemes for solving this problem can exist in a model of the DCE. Each of these schemes determines what modules and in what order should be executed. Denote by S the set of schemes. It is required to form the set S , to choose a single scheme s  S , to determine the agents, which will to execute the modules of the scheme s , and to allocate the nodes for an execution of these modules. Indicators of the scheme performance must satisfy to such criteria of the problem solving as the time or the cost. An additional criterion is the scheme reliability – the probability p* t  of the scheme performance in the time moment t . All these criteria are set by a user. A satisfaction degree of the scheme performance indicators to the user’s criteria defines a level for the quality of service in the DCE. In this formulation, the problem of computation planning and resource allocation is the NP (Non-deterministic Polynomial)-hard [16]. In order to support further generality reasoning, two fictitious modules f1 and f 2 are entered into the set F . MIPRO 2016/DC VIS The initial module f1 has the empty set of input parameters and the subset Z in  Z as the set of output parameters. This module defines initial data of the problem. The target module f 2 has the subset Z out  Z as the set of input parameters and the empty set of output parameters. This module defines target parameters of the problem. The set S can be represented by the bipartite directed acyclic graph G  V ,U  . The set V includes the subsets VZ and VF of vertices corresponding to parameters from the set Z and modules from the set F . Denote by n f the number modules included in the scheme s . Assume that, information-logical connections between modules of the scheme s are described by the Boolean matrix W . Dimensions of this matrix are n f  n f . The matrix element wi ,k  1 means that the module f i depends on the module f k . B. Multi-agent System A hierarchical structure of the MAS can include two or more layers functioning agents. Agents can play different roles and perform different functions at their levels of the hierarchical structure of the MAS. The agents roles can be constant and temporary, occurring at discrete time moments due the need to organize collective interaction. The hierarchy levels of agents differ by the amount of their knowledge. Agent of the higher level hierarchy has a large amount of knowledge in comparison with agents of the lower hierarchy level. A subsystem of the MAS for distributed computing management includes the agents of computation planning and resource allocating. These agents are designed for creating problem solving schemes and allocating resources for schemes performance. Any agents may be united in a virtual community of agents (VCA). In such VCA the agents interact between themselves on the base of a competition or cooperation. In detail, the multi-agent algorithm of computation planning and resource allocation is considered in [17]. This algorithm provides the obtaining scheme s satisfied to the time or cost criteria, which defined by a user. The data from meta-monitoring system are used in MAS. Let us now consider new aspects of the multi-agent management related with the scheme s reliability. C. Logical-Probabalistics Model Denote by na the number of agents in some VCA. These agents are expected to participate in the scheme s performance. Denote by x the set of the Boolean variables xi , j , k representing the events of execution ( xi , j ,k  1 ) or not-execution ( xi , j ,k  0 ) for the module f i in the k-th node allocated by the agent a j , i  1, n f , j  1, na . The index k of the variable xi , j , k defines correspondently the main ( k  1 ) and reserve ( k  2 ) 249 nodes of the agent a j . These nodes are allocated for an execution of the module f i . Each agent a j can allocate the number c j of the reserve nodes in the scheme s TABLE I. module f i in k-th node of the agent a j . The recurrent formulas (1)-(3) describe a logical circuit of the execution reliability of a problem solving process: y1 x   1 , if i  2,  hx , yi  x      xi , ji ,1hx , if i  3, hx    k :wi , k 1 y k x  , (1)   +   If the following condition is fulfilled na (2) c j 0, j 1 (3) then the transformation process is finished. Else, the determination of the index for module with minimum probability of this module execution is carried out:  k  arg min 1   i 3, n f , ji J    1  p  , e i , ji ,l l 1 where the number of nodes allocated by the agent a ji for the module f i is denoted by ni , ji and (4) e  ni, ji . where ji 1, na . A transition [18] from the Boolean function y2 x  to the probability function Pt  is implemented through the rules in Table I. The probability function Pt  has the following form: Pt   nf p i 3 i , ji ,1  The number c jk is reduced on one unit.  The reservation of the additional node by the agent a jk for the module f k is carried out by replacing the element xk , jk ,e of the transformable formula for the function y2 x  by the element xk , jk ,e  xk , jk ,e1 , where the number of nodes t  . The function Pt  calculates the probability for the single scenario of the scheme s performance. If Pt   p* t  , then function y2 x  transformation by means of the probability indicators improvement for the function structure elements is carried out. The function y2 x  determined in (4) is represented in the disjunctive normal form. Therefore, this function is monotonic. Such property provides the preservation or improvement of the scheme s probability when the probability indicators for the function structure elements are improved. 250  transformation process of the function y2 x  corresponding to the scheme s without nodes reservation into the function y2 ' x  corresponding to the scheme s with nodes reservation includes following stages. The Boolean function y2 x  corresponding to the target module f 2 determines the scheme s reliability indicator. After all substitutions in (1)-(3), the function y2 x  takes the form i 3 1 pi, j, k t   a ji allocates the main ( k  1 ) node for the module f i . nf xi, j , k Denote by J the set of indexes for such agents that c j  0 , J  j : c j  0, j 1, na . . The algorithm for the where i 1, n f , k  1, n f , ji 1, na . Initially, the agent y2 x    xi , ji ,1 , xi, j ,k The probability function element pi, j ,k t  The Boolean function element performance. Assume that, all nodes of the same agent are homogeneous. A possibility to run modules in the main or reserve nodes causes various scenarios of the scheme s performance. Let the Boolean function yi x  determines conditions of the module f i execution in the scheme s performance and the function pi , j ,k t  shows a probability of the RULES OF TRANSITING FROM BOOLEAN FUNCTIONS TO PROBABILITY FUNCTIONS allocated by the agent a jk for the module f k is denoted by nk , jk and e  nk , jk .  The number nk , jk is increased on one unit.  y'2 x  obtained The function in the transformation process of the function y2 x  is represented in the following form: n y '2 x    K l , l 1 (5) MIPRO 2016/DC VIS problem “calculate the parameter z 4 value knowing the parameter z1 value”. nf K l   xi , ji ,e , i 3 nf n na  n i, j , i 3 j 1 where e  k ji , k ji 1, ni, ji . Each elementary conjunction in (5) represents the one of the possible scenarios of the scheme s performance. All elementary conjunctions are numbered from 1 to n in according to their rank ascending. The orthogonalization of the function y'2 x  is carried out on the base of the algorithm offered in [18]. Such orthogonalization of the function y'2 x  provides an incompatibility of the possible scenarios of the scheme s performance. The orthogonal function ~ y '2 x  has the following form: ~ y '2 x  K1  K 1K 2  ... K 1 K 2 ...K n1K n .   Such constructions as “ f i , f j ” and “ f i  f j ” mean correspondingly that the modules f i and f j are executed sequentially only or may be executed in parallel. The scheme reliability indicator is defined as following: p* t   0,995 . All Boolean functions yi x  for the problem solving schemes are shown in Table II. TABLE II. A simplification of the function ~ y '2 x  by means of the removing of the conjunctions that are identically zero and the redundant conjunctions is carried out. A transition from the Boolean function ~ y ' x  to  p' t  , i iI where the probability of the i-th scenario for the scheme s performance is denoted by p'i t  .  The scheme s reliability is calculated with the using of the function P' t  .  If P' t   p* t  , then the transition to the first stage is carried out. Else, the transformation process is finished. The nodes reservation process considered above provides scheme reliability indicator that is as close as possible to the given criterion of scheme reliability taking into account the limitations on the number of nodes allocated by each agent. These limitations ensure convergence of the nodes reservation process. SCHEME RELIABILITY CALCULATION EXAMPLE Let the set S (Fig. 1) includes the two schemes s1 : f1, f 3  f 4 , f 6 , f 2 and s4 : f1 , f 3  f 5 , f 6 , f 2 for the MIPRO 2016/DC VIS BOOLEAN FUNCTIONS yi x  FOR PROBLEM SOLVING SCHEMES Scheme s1 s2 y1x   1 , y1x   1 , y3 x   y1x x3, j3 ,1 , y3 x   y1x x3, j3 ,1 , y4 x   y1x x4, j4 ,1 , 2 the probability function P' t  is implemented through the rules in Table I. The probability function P' t  has the following form: P' t   IV. Figure 1. Bipartite directed acyclic graph for the set S y5 x   y1x x5, j5 ,1 , y6 x   y3 x y4 x x6, j6 ,1 , y6 x   y3 x y5 x x6, j6 ,1 ,  x3, j3 ,1x4, j4 ,1x6, j6 ,1 .  x3, j3 ,1x5, j5 ,1x6, j6 ,1 . y2 x   y6 x   y2 x   y6 x   Let the set A  a1 , a2 , a3 , a4 , a5  includes 5 agents and the nodes reliability of these agents are correspondingly following: 0,999, 0,999, 0,9999, 0,9999 and 0,99. The modules distribution by the agents are shown in Table III. The performance probabilities for the schemes s1 and s 2 without nodes reservation are correspondently equal 0,9988 and 0,9898. Thus, the reservation of the additional node in scheme s 2 is necessary. TABLE III. Scheme s1 s2 Module MODULES DISTRIBUTION BY AGENTS Agent 1 2 3 4 5 f3 – – – + – f4 – + – – – f6 – – + – – f3 – – – + – f5 – – – – + f6 – – + – – 251 After the modules distribution by agents, the function y2 x  corresponding scheme s 2 has the following form: 16-07-00931. The authors are grateful to G. A. Oparin for discussions and valuable comments. y2 x   x3,4,1 x5,5,1 x6,3,1 . REFERENCES [1] The reservation is implemented by the agent a5, because the resources of this agent provide the minimum probability for execution of the module f 5 in comparison with probabilities for execution of the modules f 3 and y '2 x  have f 6 . After reservation, the functions y'2 x  и ~ the forms: [2] y'2 x   x3,4,1 x5,5,1 x6,3,1  x3,4,1 x5,5,2 x6,3,1 , [5] ~ y '2 x   x3,4,1 x5,5,1 x6,3,1  x3,4,1 x5,5,2 x6,3,1 x 5,5,1 . [3] [4] [6] A transition from the Boolean function ~ y '2 x  to the probability function P' t  and calculation of the scheme s2 probability are implemented. P' t   p3,4,1 p5,5,1 p6,3,1  p3,4,1 p5,5,2 p6,3,1 1  p5,5,1    0,9997 [7] [8] [9] Since P' t   p* t  , the transformation process is finished. V. CONCLUSION The results of the study considered above suggest the following conclusions.    Up to now, while logical-probabilistic analysis is carried out, occurs the problem of constructing Boolean functions, which describe the reliability logic circuit of a studied system. In this paper the new universal recurrent formulas for constructing such functions based on a conceptual model of the heterogeneous DCE are obtained. The use of the models and algorithm developed for logic-probabilistic analysis of the distributed computing reliability in multi-agent management system improves the quality of service for users’ jobs in the DCE. An integrated application of the diagnostic methods and tools of computing nodes jointly with the tools for the problems solving processes analysis in these nodes on the scheme level provides a substantial increase reliability of the DCE on the whole. ACKNOWLEDGMENT The study was supported by Russian Foundation of Basic Research, projects no. 15-29-07955-ofi_m and no. 252 [10] [11] [12] [13] [14] [15] [16] [17] [18] D. P. Siewiorek, R. S. Swarz, Reliable Computer Systems: Design and Evaluation. Natick, MA: CRC Press, 1998. F. Cappello, A. Geist, W. Gropp, S. Kale, B. Kramer, M. Snir, "Toward exascale resilience: 2014 update," Supercomputing Frontiers and Innovations, vol. 1, nо. 1, pp. 5-28, April 2014. E. Elmroth and J. Tordsson, “A standards-based Grid resource brokering service supporting advance reservations, coallocation, and cross-Grid interoperability,” Concurrency and ComputationPractice & Experience, vol. 21, no. 18, pp. 2298-2335, June, 2008. E. Bauer, R. Adams Reliability and Availability of Cloud Computing, Wiley-IEEE Press, September 2012. С. Engelmann, G.Vallee, T. Naughton, "Proactive Fault Tolerance Using Preemptive Migration," Parallel, Distributed and Networkbased Processing, 2009 17th Euromicro International Conference on, pp. 252 – 257, February 2008 G. Da Costa, T. Fahringer, J. Rico-Gallego, I. Grasso, A. Hristov, H. Karatza, A. Lastovetsky, F. Marozzo, D. Petcu, G. Stavrinides, D. Talia, P. Trunfio, H. Astsatryan, "Exascale machines require new programming paradigms and runtimes," Supercomputing Frontiers and Innovations, vol. 2, no. 2, pp. 6-27, September 2015. I. V. Bychkov, G. A. Oparin, A. G. Feoktistov, V. G. Bogdanova, A. A. Pashinin, "Service-oriented multiagent control of distributed computations," Automation and Remote Control, vol. 76, no. 11, pp. 2000-2010, November 2015. D. A. Menasce, E. Casalicchio, “QoS in Grid Computing,” IEEE Internet Computing, vol. 8, no. 4, pp. 85-87, July 2004. I. A. Ryabinin, “Logical-Probabilistic Calculus: A Tool for Studying the Reliability and Safety of Structurally Complex Systems,” Automation and Remote Control, vol. 64, no. 7, pp. 1177-1185, July 2003. J. Li, “A Model of Resource Reservation in Grid,” International Conference on Environmental Science and Information Application Technology, Wuhan, China: IEEE CS Pupl., pp. 199202, July 2009. D. A. Zorin, V. A. Kostenko, “Algorithm for Synthesis of RealTime Systems under Reliability Constraints,” Journal of Computer and Systems Sciences International, vol. 51, no. 3, pp. 410-417, May 2012. S. Rai, M. Veeraraghavan, K. Trivedi, “A Survey of Efficient Reliability Computation Using Disjoint Products Approach,” Networks, vol. 25, pp. 147-165, May 1995. J. Xing, C. Feng, X. Qian, P. Dai, “A simple algorithm for sum of disjoint products,” Reliability and Maintainability Symposium, Reno: IEEE CS Publ., pp. 1-5, January 2012. A. Rauzy, E. Chatelet, Y. Dutuit, C. Berenguer, “A practical comparison of methods to assess sum-of-products,” Reliability Engineering & System Safety, vol. 79, no. 1, pp. 33-42, January 2003. I. V. Bychkov, G. A. Oparin, A. P. Novopashin, I. A. Sidorov “Agent-Based Approach to Monitoring and Control of Distributed Computing Environment,” Lecture Notes in Computer Science, vol. 9251, pp. 253-257, September 2015. M. R. Garey, D. S. Johnson, “Computer and Intractability. A Guide to the Theory of NP-Completeness,” New York: W. H. Freeman and Company, 1979. V. G. Bogdanova, I. V. Bychkov, A. S. Korsukov, G. A. Oparin, A. G. Feoktistov, “Multiagent Approach to Controlling Distributed Computing in a Cluster Grid System,” Journal of Computer and Systems Sciences International, vol. 53, no. 5, pp. 713-722, September 2014. A. A. Pospelov, “Logical Methods of Circuit Analysis and Synthesis,” Moscow-Leningrad: Energiya, 1964. (in Russian). MIPRO 2016/DC VIS Distributed Graph Reduction Algorithm with Parallel Rigidity Maintenance * Diego Sušanj* and Damir Arbula* Faculty of Engineering, Department of Computer Engineering, Rijeka, Croatia dsusanj@riteh.hr, damir.arbula@riteh.hr Abstract - Precise localization in wireless sensor networks depends on distributed algorithms running on a large number of wireless nodes with reduced energy, processing and memory resources. Prerequisite for the estimation of unique nodes locations is rigid network graph. To satisfy this prerequisite network graph needs to be well connected. On the other hand, execution of distributed algorithm in the graphs with large number of edges can present significant impact on scarce resources of wireless nodes. In order to reduce number of edges in the network graph, novel distributed algorithm for network reduction is proposed. Main objective of the proposed algorithm is to remove as many edges as possible maintaining graph rigidity property. In this paper a special case of graph rigidity is considered, namely the parallel rigidity. I. INTRODUCTION In today's era of rapid development of technology and human progress, concept of ubiquitous computing has become a reality in the form of computers, mobile phones, home appliances, and other, so called, smart devices. Directly driven by such progress is a need for simple and fast connection of all those devices without relying on any existing fixed infrastructure. Wireless ad hoc networks were created in response to the above mentioned needs. They represent a selfconfiguring, decentralized dynamic networks in which devices or nodes, can move freely without depending on existing fixed infrastructure. In other words, devices are both the users and the infrastructure of those networks. A. Wireless sensors networks A certain type of ad hoc wireless networks consisting of spatially distributed autonomous sensors which are used to monitor various environmental conditions, such as temperature, pressure, illumination, etc. are called wireless sensor networks. These networks are made up of several to hundreds of interconnected nodes with cooperative relaying of the information through the network to the input/output gateway nodes. Size and the price of these nodes are directly dependent on constraints such as the type of power source, the amount of memory, processing power and communication bandwidth. The ultimate goal is to have nodes that are as independent as possible which relates to low or no maintenance and no use of the existing infrastructure, most notably in terms of power. MIPRO 2016/DC VIS Nodes are usually equipped with batteries and optionally they have some means of energy harvesting i.e. means of obtaining energy from the environment. Consequently, it is essential to pay attention to energy consumption both on hardware and on software level. There are two basic rules: (1) energy needed to send a unit of data is far greater than the energy needed to process the same amount of data, and (2) messages sent in smaller steps with lower transmission power is more energy efficient than those sent in a single step with greater transmission power [1]. B. Problem definition The motivation for this work primarily stems from the issues of scalability of wireless sensor networks, more precisely excessive network density. Problems arise when there is a need for more advanced calculations, such as localization of dense network on low end nodes and because of that, successful execution of algorithm is limited by the maximum number of neighbors for each node [2]. In this sense, an adequate selection of neighbors enables implementation of distributed algorithms in dense wireless networks without losing network rigidity property thus preserving information required for the estimation of unique nodes locations. II. LOCALIZATION Location service is one of the basic services of many emerging computing and networking paradigms [5]. In the case of wireless sensor networks, sensor nodes need to know their locations in order to put detected and recorded events into proper spatial context, or for example to enable more informed and thus more efficient routing. Sensors are usually deployed without any previous knowledge of their absolute or relative location and there is no infrastructure that would enable them to localize themselves after deployment. One method of informing node of its location is manual measurement and location calculation. Such procedure is not feasible for larger networks or in the case of deployment in the inaccessible environment. Another method involves the use of GPS, the global positioning system. To be able to localize the node using GPS, it is necessary to add a GPS receiver, which significantly increases costs of the hardware. Also its size and energy consumption are oftentimes prohibitive, as 253 well as the requirement of being in the range of satellite signals i.e. outdoors. A. Location estimation Localization methods are classified by measurements modality and two most popular are distance and azimuth between neighboring nodes. In this paper only methods based on azimuth measurements are considered. Azimuth measurements between neighboring nodes are usually performed using few different techniques. One is based on using known radiation pattern of one or more directional antennas, measuring received signal strength. By comparing the intensity of the received signal received on multiple antennas the angle of arrival of the signal can be determined [3]. Another approach is based on the measurement of the radio signals phase differences using an array of omnidirectional antennas. There is also similar technique that is using ultrasound signals [4]. B. Anchored and anchor-free localization The location information can be defined in the absolute coordinate system of the environment, or in the relative coordinate system – one that is being agreed upon between nodes themselves. Although the actual absolute locations in the latter case are not known, network topology is matching and, by using translation, rotation and scaling transformations, the relatively positioned network graph can be aligned with its absolute counterpart. The basic premise of anchored localization is having set of nodes (anchors) that know their actual location in the environment i.e. by using GPS or manually. Using at 2 least three (in the two-dimensional space -  ) or four (in 3 case of  ) non-collinear anchors it is possible to localize the entire network. In case of anchor-free localization there is no information about actual location of any node in the network. The goal of this localization is to construct new relative coordinate system usually pinned by placing one node in the origin and one of its neighbors at coordinate (1, 0). In this paper only two-dimensional case is considered. III. GRAPH RIGIDITY Basic definitions of graph theory and rigidity are given in [9] and they are the basis for the modern theory of rigidity and localization of wireless sensors networks that is developed further in [5], [6] and [7]. Wireless network topology is usually described using graph, but to be able to include nodes locations, it is necessary to augment this description with the positions of the graph vertices. This concept is called point formation (or framework) and it represents one of the possible spatial realization of the network graph. Congruent formations are those that have equal distances between all vertices while equivalent formations have equal distances between connected vertices. 254 Formation is defined as a globally rigid if, and only if, all equivalent formations are also congruent. The key property of networks with globally rigid formations is that it is possible to estimate unique location of all nodes. Figure 1. Generic (a) and non-generic (b) formation Generic formations are point formations in which the coordinates of the vertices are algebraically independent over the set of rational numbers. If the formation is not generic, there is possibility that set of neighboring nodes is part of same d  1 -dimensional space, which results in existence of equivalent, but not congruent formations. Fig. 1 shows two different formations of the same graph with the location of vertices that are: (a) generic, (b) not generic because these vertices are collinear and it's possible to find equivalent non-congruent formation. The problem of determining whether the individual formation is globally rigid is NP-hard problem, while the determination of generic formation is easier. In addition, as an important feature, and a reason why the graph theory is so thoroughly studied, is that global rigidness of generic formation depends only on the network graph. A. Parallel rigidity Equation (1) defines mapping of measured azimuths between nodes to a set of values which represents global azimuths under which node i see node j , (2). Two point formations p and q are parallel if their corresponding  N and azimuth functions  are equal for all vertices i and j provided i  j . graphs  : L  [0, 2 ) (1)   i, j   ij (2) Example of two parallel formations is shown in Fig. 2. Subfigure (a) shows the basic formation, while in (b) one can see translation of that formation. Subfigures (c) and (d) show preservation of azimuth measurements i.e. angles between edges of the graph, while scaling the network. Subfigure (e) shows a non-congruent realization of the formation in (a) while maintaining constraints set by measured azimuths. Formation which has more than one non-congruent solutions while using constraints based on azimuth measurements is called flexible, while the one that has only one congruent solution is called parallel rigid. One such parallel rigid formation is given in figure (f) [5], [9]. MIPRO 2016/DC VIS Every node, after receiving token, records sender’s ID in stack. In the case node doesn't have any more neighboring nodes from which it has not received token he returns token to the node on top of the stack. When node receives token from three different neighbors it is marked as rigid but continues to process information in the same way as before. Algorithm terminates globally when root node receives return token and has no neighbors left to forward it to. In that moment, all nodes should be flagged as rigid with newly defined set of traversed edges L q which is subset of initial set L p . Figure 2. Parallel formations IV. PROPOSED METHOD In this paper we propose a method for edge removal based on modified depth first search. Algorithm starts in the root node. Distributed election of the root node is specific problem that was not focus of proposed method thus the node with the lowest ID was always selected as the root node. Root node sends message to all his neighbors requesting from them to response with number of their neighbors. This number is defined as node weight. After root receives weights from all of its neighbors, a special message called token is sent to the neighbor with lowest weight. In the case in which several nodes have the same weight token is sent to the one with lowest ID. Node receiving token starts the same process as root, and requests all his neighbors’ weights. Fig. 3 shows two examples of network graphs before (on the left) and after (on the right) execution of edge removal algorithm. Network graphs after execution of the algorithm have far less edges in the dense parts of the graph. V. RESULTS Testing of this algorithm is done in three phases. First two phases are performed using Pymote simulator [8]. First phase includes automated testing on two sets of networks. First set is made of networks with specific topologies on which this, and other algorithms, didn't perform well. Second set is based on random generated networks with number of nodes between 8 and 128. In this phase proposed algorithm was tested on 8000 networks. In the second phase benchmark testing was performed in order to compare few versions of proposed algorithm. Benchmark set was made up of 100 networks with number of nodes between 8 and 16. Comparison of algorithms is based on two parameters (1) maximum node weight after algorithm execution compared to the number before execution and (2) total number of remaining edges. In the third phase five network topologies selected from the benchmark set were implemented and tested on real nodes in laboratory conditions. Proposed algorithm passed all three phases always producing rigid networks. Figure 3. Network graphs before and after edge removal algorithm MIPRO 2016/DC VIS Figure 4. Maximum number of neighbours (weight) before and after edge removal algorithm for benchmark set networks 255 To alleviate communication complexity while retaining the rigidness of network graph, distributed algorithm was designed and implemented. This edge removal algorithm was tested using both Pymote simulator and real wireless sensor nodes. The results obtained in the simulator and in the implementation on the real nodes in the laboratory have shown a significant reduction in the number of edges and maximum network weight while retaining graph rigidness. Figure 5. Number of edges before and after edge removal algorithm for benchmark set networks Graph on Fig. 4 represents maximum node weight per network before and after execution of algorithm compared to the number of nodes in network. Algorithm is executed in the 100 networks from the benchmark set with random topologies. Important observation is that maximum weight after edge removal algorithm (full line) does not increase proportionally with the number of nodes in the network. Another measure of algorithm success, is the number of remaining edges after execution of edge removal algorithm as shown in Fig. 5. Number of removed edges increases with increasing density of the network. VI. CONCLUSION Wireless sensor networks consist of a few dozens up to several thousand of spatially distributed autonomous sensors which are used to monitor various environmental conditions. In these networks location information is required for many services i.e. to provide measurement proper spatial context or to localize recorded events. Localization methods have their limitations. One of them is a need for higher computing resources and higher energy demand on low end nodes. Energy demand is notable in dense networks, because of higher communication complexity. Solution for that problem is reducing the density i.e. by creating communication tree or reduced communication graph. On the other hand, requirement to estimate unique location of all nodes is global rigidity of network formation. 256 The field of localization in wireless sensor networks and the development of new, and improving existing algorithms is extremely broad. Future work includes improvement of proposed algorithm by creating minimal rigid graph (graph with theoretic minimum number of edges), as well as creating method for selecting a subset of neighbors which results in as small as possible localization errors. Furthermore, it would be interesting to study the rigidity with directed graph, i.e. network where mutual visibility of nodes is not guaranteed. [1] [2] [3] [4] [5] [6] [7] [8] [9] REFERENCES F. Zhao and L. J. Guibas, Wireless Sensor Networks: An Information Approach. Morgan Kaufmann Publishers, Elsevier Inc., 2004. D. Arbula, “Distributed algorithm for node localization in anchor free wireless sensor network,” Faculty of Electrical Engineering and Computing, University of Zagreb, 2014. J.-R. Jiang, C.-M. Lin, F.-Y. Lin, and S.-T. Huang, “ALRD: AoA Localization with RSSI Differences of Directional Antennas for Wireless Sensor Networks,” International Journal of Distributed Sensor Networks, vol. 2013, Mar. 2013. N. B. Priyantha, A. K. L. Miu, H. Balakrishnan, and S. Teller, “The Cricket Compass for Context Aware Mobile Applications,” in 7th ACMConf. Mobile Computing and Networking (MOBICOM), 2001. T. Eren, W. Whiteley, and P. N. Belhumeur, “A Theoretical Analysis of the Conditions for Unambiguous Node Localization in Sensor Networks,” Department of Computer Science, Columbia University, 2004. T. Eren, W. Whiteley, and P. N. Belhumeur, “Using Angle of Arrival (Bearing) Information in Network Localization,” in 2006 45th IEEE Conference on Decision and Control, 2006, pp. 4676– 4681. D. Zelazo, A. Franchi, and P. R. Giordano, “Rigidity Theory in SE(2) for Unscaled Relative Position Estimation using only Bearing Measurements,” arXiv:1311.1044 [math], Nov. 2013. D. Arbula and K. Lenac, “Pymote: High Level Python Library for Event-Based Simulation and Evaluation of Distributed Algorithms,” International Journal of Distributed Sensor Networks, vol. 2013, Mar. 2013. G. Laman, “On graphs and rigidity of plane skeletal structures,” J Eng Math, vol. 4, no. 4, pp. 331–340, Oct. 1970. MIPRO 2016/DC VIS Architecture of Virtualized Computational Resource Allocation on SDN-enhanced Job Management System Framework Yasuhiro Watashiba∗ , Susumu Date† , Hirotake Abe‡ , Kohei Ichikawa∗ , Yoshiyuki Kido† , Hiroaki Yamanaka§ , Eiji Kawai§ , and Shinji Shimojo† ∗ Nara Institute of Science and Technology, Nara, Japan Email: {watashiba, ichikawa}@is.naist.jp † Osaka University, Osaka, Japan Email: {date, kido, shimojo}@cmc.osaka-u.ac.jp ‡ University of Tsukuba, Ibaraki, Japan Email: habe@cs.tsukuba.ac.jp § National Institute of Information and Communications Technology (NICT), Tokyo, Japan Email: {hyamanaka, eiji-ka}@nict.go.jp Abstract—Nowadays, users’ computation requests to a highperformance computing (HPC) environment have been increasing and diversifying for requiring large-scale simulations and analysis in the various science fields. In order to efficiently and flexibly handle such computation requests, resource allocation of the virtualized computational resources on an HPC cluster system such as Cloud Computing service is attracting attention. Currently, we aim to realize a novel resource management system (RMS) that enable to handle various resources of an HPC cluster system, and have been studying and developing the SDNenhanced Job Management System (JMS) Framework, which can manage an interconnect as network resources by integrating Software Defined Networking (SDN) concept into a traditional JMS. However, the current SDN-enhanced JMS Framework cannot allocate virtualized computational resources to a job because the computational resource management is performed by the mechanism of a traditional JMS. In this paper, we propose a mechanism to handle virtualized computational resources on the SDN-enhanced JMS Framework. This mechanism enables to deploy virtual machines (VMs) requested by the user to the computing nodes allocated to a job and execute job’s processes in the VMs. I. I NTRODUCTION Nowadays, a high performance computing (HPC) center, which operates HPC environment to provide resources for users, is required the efficient resource management due to a fact that users’ computations has been increasing and diversifying. In various science fields, the necessity of HPC resources to perform scientific simulations and analysis has been growing for the purpose of gaining high computing performance and solving complex and/or large-scale scientific problems. Since each of such computations has different resource usage pattern and resource requirements, the HPC center has to accommodate many and various user computations into its HPC environment. Thus, it is essential that the resources in an HPC environment are handled by considering the characteristics of each computation. MIPRO 2016/DC VIS The dominant architecture of the HPC environment has been becoming a cluster system, which is composed of many computing node connected with a high-speed network called an interconnect. An general HPC cluster system has two type of resources: computational resources such as CPU and memory, and network resources in its interconnect. Thus, it is important to efficiently and flexibly manage these resources in order to provide an appropriate set of resources for user’s computation. In many HPC cluster systems, Job Management System (JMS), such as NQS [1], PBS [2] and Open Grid Scheduler/Grid Engine (OGS/GE) [3], is widely adopted as Resource Management System (RMS) which has the role to administer users’ computations as jobs and efficiently allocate resources to them for the purpose of efficient workload balancing. However, most traditional JMS available today can only handle computational resources. This causes the computing performance degradation under the influence of inefficient allocation of the other resources to a job. In regard to the computational resource allocation to a job managed by a traditional JMS, today’s diversified user computations may require more flexible resource provision from an HPC cluster system. Generally, the computational resources allocated to a job by a traditional JMS are physical computational resources, that is, the computation of the job is performed on computing nodes directly. Since the computing environment in a computing node, such as the version of the kernel and libraries, is static, it may not be suitable for a computation of job with a specified resource requirement and restriction. Therefore, a mechanism to provide computational resources with user-required computing environment is also required. In order to realize such flexible computational resource management, virtualized computational resources, which is allocated computational resources as virtual machines (VMs) to a job, have been focused on. However, since virtualized computational resources have 257 some overhead coursed by virtualizing the physical computational resources, the computing performance on VMs may be lower than the case of physical resource allocation. Thus, a user who submits a job suitable for the allocation of physical computational resources hopes traditional resource allocation rather than the provision of virtualized computational resources. For achieving efficient and flexible resource management, a JMS on an HPC cluster system is desirable to allocate both physical and virtualized computational resources according to the features of jobs. However, a traditional JMS in an HPC cluster system does not have any mechanism to handle VMs as virtualized computational resources, and the RMS for Cloud Computing targets to allocate resources to users’ computations by virtualized computational resources. From these considerations above, we aim to realize a novel flexible and efficient JMS capable of handling various resources as well as computational resources on an HPC cluster system. So far, we have addressed an issues for managing an interconnect in an HPC cluster system as network resources, and have been studying and developing network-aware JMS, called SDN-enhanced JMS framework [4], [5]. The SDNenhanced JMS framework can allocate an appropriate set of computational and network resources by integrating Software Defined Networking (SDN) into a traditional JMS. However, in the SDN-enhanced JMS framework, since a traditional JMS takes charge of the management of computational resources, the SDN-enhanced JMS framework does not have the functionality to control virtualized computational resources. In this paper, we propose the resource management mechanism to handle VMs as virtualized computational resources on the SDN-enhanced JMS framework as well as physical computational resources. This paper is structured as follows. Section II briefly introduces the SDN-enhanced JMS framework, which can manage not only physical computational resource, but also the interconnect of HPC cluster system as network resources. Section III analyses the required functionalities for handling VMs as virtualized computational resources, and then explains the implementation of the function enhancement on the SDNenhanced JMS framework. In Section IV, we evaluate the behaviors of the proposed mechanism to control virtualized computational resources. In Section V, we conclude this paper with the future work. II. SDN- ENHANCED JMS F RAMEWORK In this section, we briefly mention the system structure and mechanism of the SDN-enhanced JMS framework [4], [5]. The SDN-enhanced JMS framework enables to manage and allocate both computational and network resources by integrating OpenFlow [6], which is a technology for realizing the SDN concept, into a traditional JMS. The OpenFlow enables the administrator to manage the whole network in a centralized and programmable manner through a SDN controller, which can be designed as software. Figure 1 shows the architecture of the SDN-enhanced JMS framework. In designing the SDNenhanced JMS framework, the low-cost reusability and a high 258 degree of program portability were considered for facilitating to cooperate with various traditional JMSs. As the result, the functionalities to handle network resources was implemented as an external module called Network Management Module (NMM). Therefore, the mechanism to manage computational resources in the SDN-enhanced JMS framework is provided by the traditional JMS as illustrated in Fig. 1. In the implementation of the SDN-enhanced JMS framework, OGS/GE [3] has been adopted as a traditional JMS to integrate the functionalities of network resource management. In the SDN-enhanced JMS framework, an interface to cooperate with the NMM is required for a traditional JMS. The OGS/GE has the Parallel Environment Queue Sort (PQS) API, which allows an administrator to define the new policy to decide the allocating computational resources. The SDNenhanced JMS framework realize the cooperation between the traditional JMS and the NMM by enhancing the PQS API. The NMM has two components: the network control component and the brain component. The network control component has two functionalities to manage network resources in the interconnect of HPC cluster system: retrieving the information of the interconnect, and controlling the communication paths among computing nodes allocated to a job. In order to implement these two functionalities by leveraging the functions of OpenFlow, Trema [7], which is a development framework of the OpenFlow controller, has been included in the network control component. The role of the brain component is to decide how to allocate resources to a job based on resource usage of the HPC cluster system and user requirement of the job. To gather the information of a target job and the usage of both computational and network resources, the brain component is connected with a traditional JMS and the network control component through the XML-RPC [8]. Moreover, a traditional JMS do not have any algorithm to determine appropriate resources for a job based on the usage and requirement of both computational and network resources. Thus, the brain component has the resource assignment policy class module, through which the administrator can define how to allocate resources to a job as a resource assignment policy. The resource assignment policy class module provides a set of APIs that facilitate to flexibly design a resource assignment policy in a Ruby script. In a job script, a user can require an appropriate resource allocation algorithm by indicating the name of resource assignment policy as a parameter. These mechanisms enable the SDN-enhanced JMS framework to manage and allocate both computational and network resources. In the SDN-enhanced JMS framework, network resources are handled as flow entries, which are defined how to communicate between computing nodes. That is, all communication paths between computing nodes allocated to each job are administered as allocated network resources. Moreover, the information of the network resource allocation and the the usage of each link in the interconnect of the HPC cluster system retrieved by the network control component are stored in a database which is equipped in the NMM. MIPRO 2016/DC VIS NetworkManagementModule Network Control OpenFlow controller (Trema) Interconnect Flow entry Database Brain ResourceAssignment PolicyClassModule Policy Traditional JMS (OGS/GE) Administrator J o b J o b shepherd shepherd execd execd Computingnode Computingnode qmaster Job Script User Fig. 1. Architecture of original SDN-enhanced JMS framework. III. A RCHITECTURE AND I MPLEMENTATION In this section, we derive the functionalities which are required for handling VMs as virtualized computational resources in the process flow of JMS, and then explain the implementation of the novel SDN-enhanced JMS framework with a mechanism to control virtualized computational resources. A. Virtualized Computational Resources The virtualized computational resources by leveraging a VM enables to flexibly control the computing environment in computational resources allocated to a job. In a VM, a user can construct arbitrary computing environment that does not depend on the structure of computing node in a HPC cluster system. Thus, a computation of a job can be perform on appropriate computing environment according to its resource requirement and restriction. Resource provision by leveraging VMs as virtualized computational resources has been adopted in Cloud Computing [9], [10]. In the HPC field, resource allocation by leveraging virtualized computational resources has attracted much attention due to the growth of virtual resource technology, and many researches to utilize virtualized computational resources have been conducted [11], [12], [13], [14]. For running a VM in a computing node, the hypervisor is generally required. The hypervisor works as the middleware for connecting between a VM and physical hardware (e.g. Kernel-based Virtual Machine (KVM), Xen, VMware vSphere Hypervisor, and Hyper-V). The VMs in a computing node are not shared the internal resources each other due to a fact that they are managed as different computing environments. Since a today’s computing node has of many CPU cores and large memory, the characteristic is useful to guarantee the performance of computational resources allocated to a job from other jobs’ behaviors. In the viewpoint of resource management, virtualized computational resources also have some advantages compared with the resource management based on physical computational resources. MIPRO 2016/DC VIS B. Functionalities for Handling Virtualized Computational Resources In this section, we analyze the required functionalities for controlling the virtualized computational resources as well as the physical computational resources and network resources on the SDN-enhanced JMS framework. To realize that virtualized computational resources are handled on the SDN-enhanced JMS framework, the following functionalities are considered to be necessary. 1) User interface to indicate an arbitrary user-required VM for executing a program in a job. 2) Controlling the behaviors of VMs in computing nodes allocated to a job. 3) Installing modules and configuration for deploying the job process in VMs. 4) Managing the information of virtualized computational resources allocated to jobs. The first functionality is user interface to indicate a userrequired VM for executing a computation in the job. This functionality is necessary to allocate arbitrary computing environment constructed by a user to a job in an HPC cluster system. Additionally, the parameters to define the number of CPU cores and the amount of memory in a VM should be included in this functionality. The second functionality is required for controlling the behavior of VMs because the SDNenhanced JMS framework cannot operate the action of VMs in computing nodes. Before starting to assign job processes to VMs, they have to boot up. Moreover, after the job is finished, the VMs must shut down for releasing the resources allocated to a job. Since the behavior of a VM is generally controlled by a hypervisor, this functionality enables to operate the hypervisor from the SDN-enhanced JMS framework. The third functionality is a mechanism to set up an environment of VM as a computing node of the HPC cluster system. In the proposed virtualized computational resource management, since it is supposed that the computing environment in a VM is constructed by a user, the VM does not have the information and the functions to control job processes from the SDN-enhanced JMS framework (e.g. environment variables and module to manage job processes, user information for authentication and authorization, a path to access the user’s home directory in the HPC cluster system, and so on). Thus, the functionality to construct computing environment capable of managing a VM from the SDN-enhanced JMS framework is essential. The fourth functionality is a mechanism to manage the information of virtualized computational resources allocated to jobs. Though the amount of used virtualized computational resources can be administered as the usage of physical computational resources, this information is necessary for administrators and users to check the status of virtualized computational resources. C. Implementation of Virtualized Computational Resource Management In this section, we describe the implementation of the novel SDN-enhanced JMS framework which equips the functional- 259 SDNͲenhancedJMSframework NetworkControl Brain OpenFlowcontroller (Trema) ResourceAssignment PolicyClassModule Flow entry Policy Interconnect Administrator Job Script User Traditional JMS (OGS/GE) qmaster Information ofVirtual Computational Resources vSwitch VM VMpool J o b shepherd vm control execd Computingnode Fig. 2. Architecture of SDN-enhanced JMS framework with the functionalities virtualized computational resource management. ities for handling virtualized computational resources. As for a hypervisor in a computing node, the KVM hypervisor is adopted due to the fact that the OS of a computing node in many current HPC cluster system is Linux. In order to manage virtualized computational resources as well as physical computational resources on the SDN-enhanced JMS framework, we have adopted a strategy of enhancing a mechanism of computational resource management on the OGS/GE. Figure 2 illustrates architecture of our proposed SDN-enhanced JMS framework with the functionalities for managing virtualized computational resource. For the first functionality listed in Section III-B, we implemented a user interface by extending the parameters of job script offered by the OGS/GE with similar uses as the requirement of network resources. An example of extended job script is shown in Fig. 3. The extended job script allows a user to embed the job’s requirements to virtualized computational resources as well as physical computational. In this example, the requirements of virtualized computational resources are composed of a path of VM image and parameters of VM setting such as the amount of memory in the VM. As for the number of CPUs configured in a VM, it is limited to one in this implementation for equalizing the number of allocating slots. The second functionality listed in section III-B was achieved by operating the KVM hypervisor on each computing node from the SDN-enhanced JMS framework. In order to control 260 #!/bin/csh #$ͲqQUEUE_NAME #$Ͳpe orte32 #$Ͳlnetprio=policy_name #$Ͳlvm image=/vm/myͲvm #$Ͳlvm memory=8gb mpirun Ͳnp$NSLOTS./a.out Fig. 3. Example of extended job script in proposed resource management system. the KVM hypervisor on each computing node, we developed a new control module, called “vm control”, and deployed it in each computing node. This module is called by the “execd” module managed by the qmaster of OGS/GE, and then performs the start or stop control of VMs on a computing node. In the OGS/GE, “execd” module usually calls “shepherd” module for deploying job processes in a computing node. In virtualized computational resource management in the proposed SDN-enhanced JMS framework, “execd” module calls vm control module instead of “shepherd” module for managing the behavior of VMs. Since this enhancement of the process flow to control job processes in a computing node is performed by utilizing a setting in the OGS/GE, the source code of OGS/GE is not changed. MIPRO 2016/DC VIS Decidingresourceallocationtojob Specifiation of computing node Calling“vm control” GettingVMinformation (1) Replacingnodeinformation (2) preͲprocessing MakingCDimageforVMsetting BootingupVM Fig. 5. OS CentOS 6.3 CPU Intel Xeon E5-2620(2.00GHz) x2 Memory 64GB Network on board Intel I350 GbE Specification of cluster system for evaluation. (3) ConfiguringVMenvironment Assigningjobprocess Finishingjobprocess StoppingVM Fig. 4. postͲprocessing Process flow of “vm control” module. Fig. 6. The third functionality listed in section III-B is implemented as scripts included in the “vm control” module. The functions of the “vm control” module are composed of three steps: (1) Getting the VM information, (2) modifying a list of computing nodes allocated to the job, and (3) setting up an environment in a VM based on retrieved information. Figure 4 shows the process flow of the “vm control” module. The step (1) is to get the hostname of VMs allocated to a job. In this implementation, the hostname of each VM is decided based on the one of allocating computing node. Since the VM hostname is not registered as the computing node of the HPC cluster system to the OGS/GE, the SDNenhanced JMS framework cannot handle them. Thus, it is necessary to retrieve the hostname of all VMs allocated to the job. In the step (2), the list of allocating computing nodes generated by the OGS/GE is rewritten into the list of VMs. In the OGS/GE, the allocation of job processes is performed in accordance with the list of allocated computing nodes. In order to assign job processes to VMs by using the mechanism of the OGS/GE, replacing the list based on the hostname retrieved in the step (1) is required. The step (3) is to boot up allocated VMs on computing nodes and then reconfigure computing environment of the VM for executing a job process. In this step, a configuration process which is performed after a VM starting is prepared as a CD image with a script to set up it, and then the CD image is run in each VM starting. Moreover, since the CD image includes a script to execute “shepherd” module, the OGS/GE can assign a job process to the VM. Regarding the fourth functionality listed in section III-B, the brain component of the SDN-enhanced JMS framework store the information of VMs allocated to each job into a database. Since a traditional JMS provides a command to show the status of resource allocation to jobs, a command to refer the MIPRO 2016/DC VIS Situation of resource allocation to a job. information of VMs allocated to each job was implemented. Moreover, in this enhanced SDN-enhanced JMS framework, a user can also require the network resources through the same way. IV. E VALUATION In this section, we mention the evaluation of the behavior of the proposed SDN-enhanced JMS framework with the mechanism to control virtualized computational resources in the developing environment. The evaluation environment is a cluster system which is composed of 4 computing nodes and 3 OpenFlow switches (NEC PF5240) as illustrated in Fig. 5 To observe the allocation of virtualized computational resource by the proposed SDN-enhanced JMS framework, we conducted the experiment in which multiple job with the different number of processes were submitted to the cluster system. In the experiment, as a requirement of virtualized computational resources, every jobs requested a same VM image, which consists of a single CPU, 2GB memory and CentOS 6.3. Moreover, a user’s home directory was mounted to the VM through Network File System (NFS) by the configuration performed by the “vm control” module. Figure 6 shows the result of two commands to display the status of the allocation of physical and virtualized computational resources. The upper part of Fig. 6 is the result of qstat command, which is provided by the OGS/GE for displaying the usage of computing nodes. The lower part of Fig. 6 show the situation of the allocation of VMs by qstat+ command, which is developed in this implementation of the fourth functionality described in Section III-C. From Fig. 6, it was confirmed that the proposed SDN-enhanced JMS framework makes it possible to assign job processes in userrequested VM like as the allocation to physical computational 261 TABLE I OVERHEAD OF PROCESS TO Process Num 1 2 3 4 CONTROL VIRTUALIZED COMPUTATIONAL RESOURCES . Pre-process (sec) 1.299 1.315 1.140 1.196 Post-process (sec) 0.004 0.004 0.004 0.004 resources. The proposed SDN-enhanced JMS framework allows a user to request either physical or virtualized computational resources to the HPC cluster system. Furthermore, a user can simultaneously require network resources by the functionality of the original SDN-enhanced JMS framework because a communication path between VMs is the same as a path between computing nodes in which the respective VMs are accommodated. Next, we measured the overhead caused by the “vm control” module for controlling VMs on the proposed SDN-enhanced JMS framework. The overhead of the “vm control” module has the pre-process and the post-process as shown in Fig. 4: the process to boot up VMs on allocated computing nodes and then configure the computing environment in the VMs before the job execution starts, and shutting down the VMs allocated to the job after it finished. Since these processes increase a processing time to allocate resources compared with the original SDN-enhanced JMS framework, it is essential to evaluate the influence of overhead generated by the virtualized computational resource management. Table I shows the measurement of each process in the “vm control” module. The measurement results are classified according to the number of processes in jobs. From this result, it was confirmed that the additional resource allocation time to control the behavior of VMs is not much to influence efficient resource management. Generally, since the time required for calculating an appropriate set of resources is spent several tens of seconds, the effect to the system throughput is small. Moreover, since the number of processes did not affect the processing time of resource allocation, it is also considered to adopt the allocation of virtualized computational resources to the large-scale computation. V. C ONCLUSION In the service of today’s HPC center, to provide virtualized computational resources for a job like as Cloud Computing is required for handling various users’ computational requests. In this paper, we proposed a mechanism of virtualized computational resource management on top of the SDN-enhanced JMS framework, which we have been studying and developing for efficiently and flexibly managing various resources and described the architecture of the novel SDN-enhanced JMS framework capable of handling the VM as virtualized computational resources. In the evaluation, we showed the behavior of virtualized computational resource allocation by the proposed SDN-enhanced JMS framework as well as physical computational resources. 262 As the future work, we will conduct the evaluation experiments on the larger computing environment and confirm the effectiveness of this resource management technology. Moreover, we will investigate more flexible network resource management by leveraging the functions of vSwitch. ACKNOWLEDGMENTS This research was supported in part by the collaborative research of the National Institute of Information and Communications Technology (NICT) and Osaka University (Research on High Functional Network Platform Technology for Largescale Distributed Computing). R EFERENCES [1] B. A. Kingsbury, The Network Queuing System, Sterling Software, Palo Alto, 1986. [2] R. Henderson, “Job Scheduling under the Portable Batch System,” in Job Scheduling Strategies for Parallel Processing, D. Feitelson and L. Rudolph, Eds. Springer, 1995, vol. 949, pp. 279–294. [3] “Open Grid Scheduler: The official Open Source Grid Engine.” [Online]. Available: http://gridscheduler.sourceforge.net/ [4] Y. Watashiba, S. Date, H. Abe, Y. Kido, K. Ichikawa, H. Yamanaka, E. Kawai, S. Shimojo, and H. Takemura, “Performance Characteristics of an SDN-enhanced Job Management System for Cluster Systems with Fat-tree Interconnect,” in Emerging Issues in Cloud (EIC) Workshop, The 6th IEEE International Conference on Cloud Computing Technology and Science (CloudCom 2014), December 2013, pp. 781–786. [5] ——, “Efficacy Analysis of a SDN-enhanced Resource Management System through NAS Parallel Benchmarks,” The Review of Socionetwork Strategies, vol. 8, no. 2, pp. 69–84, December 2014. [6] N. McKeown, T. Anderson, H. Balakrishnan, G. Parulkar, L. Peterson, J. Rexford, S. Shenker, and J. Turner, “OpenFlow: Enabling Innovation in Campus Networks,” SIGCOMM Computer Communication Review, vol. 38, no. 2, pp. 69–74, Mar 2008. [7] “Trema: Full-Stack OpenFlow Framework for Ruby/C.” [Online]. Available: http://trema.github.com/trema/ [8] S. S. Laurent, E. Dumbill, and J. Johnston, Programming Web Services with XML-RPC. O’Reilly & Associates, Inc., 2001. [9] B. P. Rimal, E. Choi, and I. Lumb, “A taxonomy and survey of cloud computing systems,” in 2009 Fifth International Joint Conference on INC, IMS and IDC, 2009, pp. 44–51. [10] W.-T. Tsai, X. Sun, and J. Balasooriya, “Service-oriented cloud computing architecture,” in 2010 Seventh International Conference on Information Technology: New Generations (ITNG), 2010, pp. 684–689. [11] Y. Chen, T. Wo, and J. Li, “An efficient resource management system for on-line virtual cluster provision,” in Cloud Computing, 2009. CLOUD’09. IEEE International Conference on. IEEE, 2009, pp. 72– 79. [12] X. Li, H. Palit, Y. S. Foo, and T. Hung, “Building an hpc-as-a-service toolkit for user-interactive hpc services in the cloud,” in Advanced Information Networking and Applications (WAINA), 2011 IEEE Workshops of International Conference on, 2011, pp. 369–374. [13] M. Taifi, A. Khreishah, and J. Y. Shi, “Building a Private HPC Cloud for Compute and Data-Intensive Applications,” the International Journal on Cloud Computing: Services and Architecture (IJCCSA), pp. 1–20, 2013. [14] J.-L. Yu, C.-H. Choi, D.-S. Jin, J. R. Lee, and H.-J. Byun, “A Dynamic Virtual Machine Allocation Technique Using Network Resource Contention for a High-performance Virtualized Computing Cloud,” International Journal of Software Engineering & Its Applications, vol. 8, no. 9, 2014. MIPRO 2016/DC VIS Near Real-time Detection of Crisis Situations Sylva Girtelschmid∗ , Andrea Salfinger† , Birgit Pröll‡ , Werner Retschitzegger§ , Wieland Schwinger¶ ∗‡ Inst. for Application Oriented Knowledge Processing, †§¶ Dept. of Cooperative Information Systems Johannes Kepler University Linz Altenbergerstr. 69, 4040 Linz, Austria K0956640@students.jku.at∗ , {andrea.salfinger† , werner.retschitzegger§ , wieland.schwinger¶ }@cis.jku.at, bproell@faw.jku.at‡ Abstract—When disaster strikes, be it natural or man-made, the immediacy of notifying emergency professionals is critical to be able to best initiate a helping response. As has social media become ubiquitous in the recent years, so have affected citizens become fast reporters of an incident. However, wanting to exploit such ‘citizen sensors’ for identifying a crisis situation comes at a price of having to sort, in near real-time, through vast amounts of mostly unrelated, and highly unstructured information exchanged among individuals around the world. Identifying bursts in conversations can, however, decrease the burden by pinpointing an event of potential interest. Still, the vastness of information keeps the computational requirements for such procedures, even if optimized, too high for a non-distributed approach. This is where currently emerging, real-time focused distributed processing systems may excel. This paper elaborates on the possible practices, caveats, and recommendations for engineering a cloud-centric application on one such system. We used the distributed real-time computation system Apache Storm and its Trident API in conjunction with detecting crisis situations by identifying bursts in a streamed Twitter communication. We contribute a system architecture for the suggested application, and a high level description of its components’ implementation. I. I NTRODUCTION During the last decade, social media channels have evolved to the point of becoming omnipresent in our everyday lives. Platforms, such a Twitter, play a vital role in sharing information fast. It should come to no surprise that the Twitter data pool is popular for information mining as it can reveal valuable insights, be it for crisis monitoring applications, or marketing, to monitor the perception of a new product. Moreover, being able to utilize Twitter data to promptly detect a situation that requires a responsive action from a rescue crew, e.g. during natural disasters, can often save lives. However, even the situation detection alone is non-trivial. For a human, processing Twitter conversations for this purpose would mean having to look into the content of each message and extract the interesting information from it. A more efficient approach, better suited for a computer, is to cluster similar posts without having to first consider the semantics of the content. To satisfy the real-time detection demands, techniques for bursty topics identification are applicable here. Bursts are in this context collections of posts mentioning the same topic more often within the current period than within the previous one. The problem of detecting burst topics falls under a broader category of research dealing with Topic Detection and Tracking (TDT) that has been extensively addressed in academic articles. MIPRO 2016/DC VIS Of the various TDT approaches, clustering is a popular one to identify bursts. Other methods for burst detection include finite state automata, Fourier transform, time series, or Wavelet transform [1]–[3]. For our first setup, the clustering approach appealed the most to us thanks to its inherent property of being readily parallelizeable. However, the enormous throughput of the real-world streams of Twitter posts prohibits timely online processing of a sequential implementation of state-of-the-art methods for topic clustering. For an acceptable performance, this problem requires single-pass, and optimized methods, as well as scalable implementation. In the recent past there has been work on successfully applying Cloud technologies to the online clustering problem to guarantee scalability, and near realtime processing of streaming data [4]–[6]. In this paper we show that Cloud technologies have a potential in improving responsiveness during emergency situations through mining social media content. Since our system also needs to work with persistent data, we decided to make use of the Trident API, a high-level abstraction of Apache Storm, as it makes stateful processing more manageable than Storm alone. Although Trident is gaining popularity, to the best of our knowledge, we are not aware of any other systems that employ Trident for the use of Burst Topic Detection. In our work, we evaluate Trident’s applicability to detecting an outburst of a crisis situation in near real-time. We propose an architecture for such an online disaster detector system and contribute a high level description of a Trident topology implementation. The rest of this paper is organized as follows: In the next section we discuss a number of research areas related to our work and identify the specific ideas from which our devised system borrows. In sec. III, we detail our approach to detecting new emergency situations from Twitter data streams. Sec. IV discusses our test cases. Finally, in sec. V, we conclude on our findings, and provide an outlook on future work. II. R ELATED W ORK Due to the multi-disciplinarity of the envisioned application domain, related approaches, and valuable preparatory work need to be drawn from several areas. Techniques essential to address the requirements imposed by the crisis management application domain can be found in the research fields of Event Detection, First Story Detection, Burst Detection, Knowledge 263 Bases, Keyword and Topic Extraction, and Parallelization, which we will motivate and discuss in the following. a) Event Detection: Finding a common topic within a set of documents has been covered extensively in research (TDT initiatives). In the subject of crisis monitoring, this research also finds its application, as event/story detection is in this subject’s heart. There are various approaches to this problem: e.g. in [7], [8], the method of clustering documents based on word similarity using the Vector Space model (VSM) is presented. In [9], the clustering of the documents is improved by considering the locality information in the process of discriminating the events into the clusters. In [10], the similarity score calculation also incorporates the social network structure besides the content of the message. Yet another class of solutions for event detection, and tracking bases on probabilistic methods, such as presented in [11]– [13]. There, the actually encountered message density w.r.t. the specific topics is compared to an expected density. A valuable work summarizing research on event detection in the realm of Twitter can be found in [14]. b) Detection of Novel Events: Although organizing messages into clusters brings a useful insight to our work, it is not sufficient for our problem solution. We have to consider methods capable of identifying those topics that have not been discussed before within some long enough time period. In other words, we need to identify posts that are discussing a new story. First Story Detection (FSD) methods are therefore well applicable here, as they are designed to detect when a document discusses a previously unseen content. Examples of FSD implementation are presented in the works of [15], [16]. The authors propose an optimized approach to FSD, which makes it well suited for identifying events online, i.e. for identifying events from real-time streaming text such as tweets. The ideas set out in these works form the basis of our new event detection algorithm. c) Identifying Bursts: Typically, whenever a newsworthy event occurs, many people tend to share information about it on social media. This causes a temporal burst of closely related messages which can be captured [1], [17]. Detecting bursts has also been well studied [1]–[3], [18]. For example, in [3], the author achieves real-time event detection by clustering wavelet-based signals. Wavelet transformation technique is applied to build signals for individual words, which are cross correlated and the signals are then clustered using a scalable eigenvalue algorithm. In our approach, we use a bucketing method, similar to that proposed by [15], [16], since it directly fits the parallel programming paradigm we adopted. d) Knowledge-based Sensing: In our system, we also need to consider techniques that allow us to report only events which we are interested in. In this context, an event is defined as a topic that suddenly draws the attention of the public. As such, it may often be of no value for our purposes, since, for example, news about celebrity deaths are also causing abrupt increase in related posts being sent. Our application needs to be able to identify only those events that are related to disasters. To achieve this, a disaster ontology 264 may be employed. There have been many attempts in building an ontology related to disaster. It is, however, often the case that existing disaster ontologies focus only on a specific type of disaster, which they explore and describe in detail [19], [20]. For our purposes, it is sufficient to have a high level disaster dictionary to identify the particular disaster situation referred to in the tweets. Working with multiple languages would, therefore, be also feasible. e) Extracting Keywords: Document keyword and keyphrase extraction is another complex problem addressed in the research [21]. The frequent techniques include word frequency analysis, distance between words, or lexical chains to rank the keywords. When it comes to keyphrase extraction, graph-based ranking methods are being successfully used. However, such methods are proposed primarily for a single document or a document collection. This is not directly applicable when it comes to extracting representative words from such a short document as a tweet. The large variety of topics found in tweets also makes this task more challenging. In [22], the authors propose a method specifically for keyword extraction from Twitter. The approach is based on organizing keyphrases by topics learnt from Twitter. For our purposes, the best approach is to apply a content-sensitive keyword extraction, as our tweet collection from which we need to extract the keywords is already formed of similar content. f) Meeting Real-time Demands: The algorithms that can achieve our task of detecting a new story from a real-time stream of data are comparatively computationally expensive. To safely achieve real-time response of a system processing the full streaming data flow from Twitter (the so called ‘firehose’ ), parallelization of the algorithm is necessary. Some recent studies have shown that the implementation of clustering algorithms on the Storm distributed framework is reasonable [4], [5]. In our implementation, however, we utilize the higher level Storm API called Trident. This design benefits from the guarantee of exactly-once semantics1 , as well as the ability to process streaming messages in batches which performs better when querying a database system. III. I MPLEMENTATION Extracting useful information from Twitter data flow is no doubt a challenge. The posts are not only short (140 characters), and varied in topic but also highly noisy, in that most of the conversations contain a lot of useless babble. The task of identifying bursts offers itself as the best approach to finding information of value as more people will get drawn into a conversation about something vital than about a certain individual reporting on currently drinking the most delicious cacao. In this project, we focus on utilizing one of the FSD technique which applies a Vector Space model and is optimized using Locality Sensitive Hashing. This technique is described in detail in [16], and we provide a brief overview 1 During a node failure, only the messages that have not been fully processed will get resent. This is as opposed to Storm’s at-least-once semantics, where some messages may get processed twice. MIPRO 2016/DC VIS in subsection III-B1. This FSD algorithm outputs a similarity score for each incoming message together with the ID of the message to which it is the most similar. The next stage involves the actual clustering, or bucketing to identify bursts. This is performed based on the message’s content similarity score and monitoring of the growth rate of the buckets in which similar tweets are gathered. The components of our system are described in sec. III-B. A. Distributed Platform Besides Spark Streaming2 , the Apache Storm Trident API is currently the best suited framework for our application. It simplifies the implementation of parallel tasks by providing a programming interface suitable for stream processing and continuous computation3 . As already mentioned in sec. I, the successful use of Storm Apache in data stream clustering applications has recently been reported [4], [5]. The advantages of using the Trident API over the core Storm API in our application are threefold: First, Trident can guarantee exactly-once semantics as opposed to at-least-once semantics of Storm. This means, that we don’t have to worry about already processed messages being resend in case of a failure in the computing cluster and therefore we will not be faced with potential erroneous bursty event reports. Another advantage of using Trident is the fact that it handles streams of batches of tuples as opposed to streams of tuples. This achieves better performance for communication with a database. Lastly, state persistence handling is incorporated generically in the Trident API which allows us to be flexible in the selection of our back-end technology. On the other hand, using Trident requires some additional checks for our algorithm, which we point out at the end of subsection III-B2. B. System Components Fig. 1 shows a component overview of the proposed system. In our setup, the topology is fed by data from a KafkaSpout, where the source of the stream is 1% of the Twitter firehose accessed through a Hosebird client4 . The first component of our system implements the core parts of the open source project “First Story Detection on Twitter using Storm“5 and adopts it for running on an actual distributed cluster with real-life data streams. For later queries, tweets are saved in the NoSQL distributed storage mechanism Apache Cassandra. The next component takes care of grouping of related tweets into buckets, which are monitored for bursts based on their growth rate. References to the bulk of tweets from the fastest growing bucket, i.e. the tweets form the burst, are also made persistent. The task of the third component is to decide whether the topic of the burst is of interest. The final component executes only if a crisis related new event was detected. Its task is to notify responsible operators from 2 http://spark.apache.org/streaming/ 3 http://storm.apache.org/ 4A robust Java HTTP library for consuming Twitter’s Streaming API 5 https://github.com/mvogiatzis/first-stories-twitter MIPRO 2016/DC VIS the area of the event occurrence, and to automatically spawn a new instance of a Twitter search tracking this event. 1) FSD using Locality Sensitive Hashing: In this section, we briefly explain the gist of the FSD component.6 This component outputs additional information for each incoming message: the Twitter ID of the closest neighbour message (the most similar post), and a score of similarity in the range of [0.0 − 1.0], where 1.0 identifies an identical post. Every newly arriving message from the Twitter data stream is split into words and a number of preprocessing steps are applied (e.g. removal of URLs and mentions, replacement of some, often intentionally, misspelled words, as well as simplistic7 word stemming). The cleaned up corpus is then submitted for calculation of the nearest neighbour. First, this process involves determining the TF-IDF (TermFrequency - Inverse Document Frequency) weighting for each term in the message to convert the representation of the tweet message into a vector normalized by Euclidean norm. The new vector must then be compared to the vectors of the preceding messages. An approximate near-neighbour among the seen documents can be found fast using the Locality Sensitive Hashing algorithm. It works on the assumption that similar tweets tent to hash to the same value. This allows for an efficient optimization where the number of documents to which comparison has to be made is greatly reduced. Namely, it uses hash tables to bucket similar tweets so that each incoming tweet will be compared with only the tweets that have the same hash.Finally, cosine similarly measure is applied to compute the distance of the nearest neighbour from the incoming tweet. If the calculated distance to the closest tweet is below a predefined threshold, this algorithm also uses an additional step of comparing the distance to a fixed number of latest tweets. This step alleviates the problem of possibly overlooking a closer tweet posted in the immediate time proximity. Such problem can easily arise given the method’s randomness of selecting tweets for comparison in the first place. 2) Bucketing and Identifying bursts: In effect, the process of grouping the posts that are similar also identifies a new event. In other words, if an incoming message was found to have a low similarity score, this message will be marked as a potential new event (by placing it in a new bucket). Then, given that there will be enough related/similar posts following it, this event will grow in its own cluster. In [16], this process is referred to as threading, whereas in [10] it is called the cluster summary. We chose to describe this process in two subtasks - one of bucketing and the other of identification of the bursts. The first task of this component is to gather similar tweets in the same bucket. Before bucketing, tweets whose similarity score is unfavorable (too low) are filtered out. I.e. if (1 − cosineDistance) < threshold the tweet is included in 6 For detailed explanation, refer to the original author’s website at http: //micvog.com/2013/09/08/storm-first-story-detection/ 7 For performance reasons, we only apply stemming to English words that have a common stem in all their inflected variants 265 TRIDENT TOPOLOGY - part1 CLUSTERING NEW BURSTY STORIES Real-time stream of tweets 350,000 tweets per minute From Kafka Spout tweets flow into the FSD algorithm 1% of Firehose FSD Optimized Locality Sensitive Hashing algorithm Incoming tweet ID Colliding tweet ID Proximity score BURST EVENT DETECTION Bucketing and burst identification From the selected FSD result, tweets get saved into C* Topic, keywords, and butst content (tweet IDs, texts, users, and locations) are saved into C* TRIDENT TOPOLOGY - part 2 DISASTER DETECTION AND ACTION ALERT NOTIFIER STREAMING FILTERED SEARCH HISTORICAL FILTERED SEARCH DISASTER DETECTOR Localized Topic Keywords and Burst content LOCALIZED OPERATOR NOTIFICATION TOPIC'S KEYWORDS EXTRACTOR TOPIC MATCHING AGAINST DISASTER DICTIONARY LOCALIZING THE EVENT Figure 1: System architecture overview further processing. The predefined threshold was found to give the best result if kept in the range of [0.5, 0.6]. As explained in [16], higher threshold values cause the topics in a bucket to be diverse and plentiful, while lower threshold makes the topics very specific and scarce. A passing tweet will then be placed either in a new bucket, if its nearest neighbour is not already in one of the existing buckets, or it will join the already existing cluster of similar tweets in their bucket. If it ends up in a new bucket, it is considered to potentially contribute to a new story discussing a previously unseen content. The number of buckets is finite, so whenever all buckets are occupied and a new story tweet streams in, the bucket that has the lowest timestamp of its most recent content update is freed up for reuse. Also the size of a bucket is limited. Whenever a new tweet is determined to belong to a full bucket, we simply remove the older half of the bucket to free up space. The assumption is that the tweets that the removed IDs refer to are not of importance (too far apart to be able to form a burst) since otherwise they would have been already reported as a 266 burst. An important Trident implementation detail lies in the necessity to first simulate the filling of the buckets. This is because Trident works on batches and, therefore, the state of the buckets gets updated only at the end of a batch. Let us consider the case when a new batch contains a new event as its first tweet, and the tweets following it are related only to that first tweet. Since that tweet is not placed in the bucket before the next tweet is processed, the next tweet would also be (incorrectly) identified as a new event and a new bucket would be assigned to it. This is a clear disadvantage of using batches in our algorithm, however, it pays off later when communication with a database system is required. Finally, once in a while a fastest growing bucket will be detected. Before passing the burst content down the processing pipeline, the Burst Event Detector determines whether this bulk of tweets could be considered a burst by checking the time span within which the tweets were posted. The burst content is then consumed by the Disaster Detector component. MIPRO 2016/DC VIS 3) Disaster Detector: The task of the Disaster Detector is to find whether the captured event is an emergency situation and, if it is, where it is located. First, the keywords must be extracted and based on them, utilizing our disaster type dictionary, we identify the topic. Finally, if the topic matches a disaster situation, we localize it. At this stage, database query is necessary to access the tweet. a) Extracting keywords: As keywords, we wish to find five words that best summarize the content of the burst. Our approach to finding them involves calculating TF within the burst. In other words, we only weigh each term positively for the number of times it occurs within the burst. The disaster dictionary is meant to contain words representative of a specific disasters. For our testing purposes, we populated our dictionary by representative keywords of a hurricane and snowstorm disasters. The datasets were compiled from querying Twitter’s historical stream for the specific disaster types during their known occurrences. b) Identifying disaster type: The topic is identified by matching any of the keywords to the part of the dictionary describing a given (predefined) topic. As an additional check, needed to prevent overlapping terms causing us to select the wrong topic, we weigh the keywords negatively relative to the number of times it occurs in the dictionary entry for other topics. The entries for topic, keywords, IDs of the contributing tweets and their text, location (if given) and users’ information are saved to Cassandra. The database can be queried by other topologies (possibly spawned later to do retrospective analysis on the tweets) without affecting the performance of our system. c) Geo-localizing the event: In order to inform only those crisis response agencies that are directly concerned with the occurring crisis situation, we need to be able to determine where the event is happening. Spacial grounding of the tweets in the burst requires to consider not only the coordinates, if given, but also the user’s location as well as the places mentioned directly in the text of the tweet. This non-trivial task is carried out by the Geo-Tagger subcomponent we describe in our previous work [23]. 4) Alert Component: Once a disaster event is detected and its location, type and representative keywords are determined, we pass the information to the Alert component. The keywords are used to start up new data collection sessions from Twitter both from recent history as well as from a real-time stream. Finally, the responsible operators can be notified who, for the duration of the disaster, are assumed to administer searches also from other social media channels using our CrowdSA application presented in [23]–[25]. IV. R ESULTS AND E VALUATION Due to the difficulty of determining the “ground truth” and the appropriate metrics, a fully functional evaluation of our system (the correct and near real-time detection of a crisis situation) is beyond the scope of the present work. However, we devised three types of tests to show that our proposal of implementing the detection of crisis situations from Twitter MIPRO 2016/DC VIS content in Trident is reasonable. All tests carried out are run in an unsupervised mode, where we do not assume to know a priori the type of event being detected. The data collected for this purpose are from hurricane Iselle reaching Hawaii in August 2014, snowstorm affecting the US East cost in January 2016, and hurricane Patricia hitting Mexico in autumn 2015. In the first test case, we published our collected historical data to the Kafka queue and measured the processing speed in terms of the number of tweets processed in a second. We were able to reach an average (for the three different datasets) processing speed of about 4000 messages per second. This is below what the full Twitter firehose could serve (on average, there are about 6000 messages posted on Twitter in one second8 ). However, after increasing the parallelization and utilizing a larger cluster, the performance can be boosted to surpass the firehose requirements. In the second testing scenario, we evaluated the effectiveness of the burst detection capability and the keyword extraction by consuming the historical datasets from the hurricane Iselle. Our system could detect a number of bursts within the datasets occurring relative to the time of the day. Most importantly, the largest burst was detected on the 10th of August, which corresponds to our graphical inspection of the dataset. For the third test, we fed the system by real-time Twitter streaming data. These account for about 1% of the Twitter Firehose. Since it was not expected to encounter any hurricane or snowstorm disaster during the run of the test, we let the system report on all bursts which it found, essentially skipping the Disaster Detector component. We observed that even with only six nodes, the system was performant in that it processed tweets fast enough without getting congested and falling behind. All of the above tests were run on a six node virtual cluster with 8 cores each, 64 bit CentOS, and 16GB of RAM. The cluster was provisioned using Ambari9 and runs Storm, Kafka broker, and a Cassandra server. The 1st node runs Nimbus (Storms master daemon) while the rest are the worker nodes (running Storm’s Supervisors). Greater parallelization within Storm was selectively applied for intensive tasks such as the dot product calculation in similarity estimations. As a future evaluation strategy, we plan to experiment with different parallelization setups within Storm while processing the same data stream of tweets to better understand the capabilities of the distributed framework. V. C ONCLUSION AND F UTURE W ORK In this paper, we reported on our work that focused on the problem of near real-time (online) Burst Topic Detection from streaming Twitter data in application to detecting newly occurring crisis situations. Our approach involves clustering tweet messages based on content similarity and implementing the algorithms in a parallel fashion using the Apache Storm 8 http://www.internetlivestats.com/one-second/ 9 https://ambari.apache.org/ 267 Trident API. Our results have shown that the use of First Story Detection in combination with capturing temporal bursts of similar messages streamed from a Twitter firehose, and mapping the captured events to a predefined disaster dictionary enables fast reporting of a crisis situation when implemented in a distributed fashion. As a future work, we are interested in looking at the potential improvements if the structure of the social network is also considered in the similarity measurements. Additionally, we plan to compare the results of the unsupervised event detection (as currently performed by our setup) with a supervised approach in which we would use our dictionary to filter tweets before they are passed to the clustering component. Last but not least, we would like to experiment with incorporating a more sophisticated model for filtering out the tweets before bucketing. Namely, since in the case of a large burst, it might be more effective to also include the very near tweets in the processing, we’d like to use a dynamic threshold value updated based on the current state of the stream. ACKNOWLEDGMENT The authors would like to thank Matthias Steinbauer and the Telecooperation Institute of JKU Linz for providing cloud computing resources and support during provisioning our cluster. This work has been funded by the Austrian Federal Ministry of Transport, Innovation and Technology (BMVIT) under grant FFG BRIDGE 838526 and under grant AD WTZ AR10/2015. R EFERENCES [1] J. Kleinberg, “Bursty and hierarchical structure in streams,” in Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’02, (New York, NY, USA), pp. 91–101, ACM, 2002. [2] Q. He, K. Chang, and E.-P. Lim, “Analyzing feature trajectories for event detection,” in Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’07, (New York, NY, USA), pp. 207–214, ACM, 2007. [3] J. Weng and B.-S. Lee, “Event detection in twitter.,” ICWSM, vol. 11, pp. 401–408, 2011. [4] X. Gao, E. Ferrara, and J. Qiu, “Parallel clustering of high-dimensional social media data streams,” in Cluster, Cloud and Grid Computing (CCGrid), 2015 15th IEEE/ACM International Symposium on, pp. 323– 332, May 2015. [5] G. Wu, O. Boydell, and P. Cunningham, “High-throughput, web-scale data stream clustering,” in Proceedings of the 4th Web Search Click Data workshop (WSCD 2014), 2014. [6] K. XIANGSHENG, “Microblog mining based on cloud computing technologies: Mesos and hadoop.,” Journal of Theoretical & Applied Information Technology, vol. 48, no. 3, 2013. [7] J. Yin, A. Lampert, M. Cameron, B. Robinson, and R. Power, “Using social media to enhance emergency situation awareness,” Intelligent Systems, IEEE, vol. 27, pp. 52–59, Nov 2012. [8] J. Rogstadius, M. Vukovic, C. A. Teixeira, V. Kostakos, E. Karapanos, and J. A. Laredo, “Crisistracker: Crowdsourced social media curation for disaster awareness,” IBM J. Res. Dev., vol. 57, pp. 1:4–1:4, Sept. 2013. 268 [9] M. Nagarajan, K. Gomadam, A. P. Sheth, A. Ranabahu, R. Mutharaju, and A. Jadhav, Web Information Systems Engineering - WISE 2009: 10th International Confee rence, Poznań, Poland, October 5-7, 2009. Proceedings, ch. Spatio-Temporal-Thematic Analysis of Citizen Sensor Data: Challenges ann d Experiences, pp. 539–553. Berlin, Heidelberg: Springer Berlin Heidelberg, 2009. [10] C. C. Aggarwal and K. Subbian, “Event detection in social streams,” in In Proceeding of the Twelfth SIAM International Conference on Data Mining, (Anaheim, California, USA), pp. 624–635, SIAM / Omnipress, April 26-28 2012. [11] H. Smid, P. Mast, M. Tromp, A. Winterboer, and V. Evers, “Canary in a coal mine: Monitoring air quality and detecting environmental incidents by harvesting twitter,” in CHI ’11 Extended Abstracts on Human Factors in Computing Systems, CHI EA ’11, (New York, NY, USA), pp. 1855– 1860, ACM, 2011. [12] T. Sakaki, F. Toriumi, and Y. Matsuo, “Tweet trend analysis in an emergency situation,” in Proceedings of the Special Workshop on Internet and Disasters, SWID ’11, (New York, NY, USA), pp. 3:1–3:8, ACM, 2011. [13] T. Sakaki, M. Okazaki, and Y. Matsuo, “Earthquake shakes twitter users: Real-time event detection by social sensors,” in Proceedings of the 19th International Conference on World Wide Web, WWW ’10, (New York, NY, USA), pp. 851–860, ACM, 2010. [14] F. Atefeh and W. Khreich, “A survey of techniques for event detection in twitter,” Comput. Intell., vol. 31, pp. 132–164, Feb. 2015. [15] H. Becker, M. Naaman, and L. Gravano, “Learning similarity metrics for event identification in social media,” in Proceedings of the Third ACM International Conference on Web Search and Data Mining, WSDM ’10, pp. 291–300, ACM, 2010. [16] S. Petrović, M. Osborne, and V. Lavrenko, “Streaming first story detection with application to twitter,” in Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, HLT ’10, pp. 181–189, Association for Computational Linguistics, 2010. [17] W. Xie, F. Zhu, J. Jiang, E.-P. Lim, and K. Wang, “Topicsketch: Realtime bursty topic detection from twitter,” in Data Mining (ICDM), 2013 IEEE 13th International Conference on, pp. 837–846, Dec 2013. [18] Q. Diao, J. Jiang, F. Zhu, and E.-P. Lim, “Finding bursty topics from microblogs,” in Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1, ACL ’12, (Stroudsburg, PA, USA), pp. 536–544, Association for Computational Linguistics, 2012. [19] S. Liu, D. Shaw, and C. Brewster, “Ontologies for crisis management: a review of state of the art in ontology design and usability,” in Proceedings of the Information Systems for Crisis Response and Management conference (ISCRAM 2013 12-15 May, 2013), 2013. [20] D. De Wrachien, J. Garrido, S. Mambretti, and I. Requena, “Ontology for flood management: a proposal,” Flood Recovery, Innovation and Response III, vol. 159, p. 3, 2012. [21] Y. Matsuo and M. Ishizuka, “Keyword extraction from a single document using word co-occurrence statistical information,” International Journal on Artificial Intelligence Tools, vol. 13, no. 01, pp. 157–169, 2004. [22] W. X. Zhao, J. Jiang, J. He, Y. Song, P. Achananuparp, E.-P. Lim, and X. Li, “Topical keyphrase extraction from twitter,” in Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies-Volume 1, pp. 379–388, Association for Computational Linguistics, 2011. [23] A. Salfinger, W. Retschitzegger, W. Schwinger, and B. Pröll, “Crowd sa – towards adaptive and situation-driven crowd-sensing for disaster situation awareness,” in Cognitive Methods in Situation Awareness and Decision Support (CogSIMA), 2015 IEEE International InterDisciplinary Conference on, pp. 14–20, IEEE, 2015. [24] A. Salfinger, S. Girtelschmid, B. Pröll, W. Retschitzegger, and W. Schwinger, “Crowd-sensing meets situation awareness: A research roadmap for crisis management,” in System Sciences (HICSS), 2015 48th Hawaii International Conference on, pp. 153–162, IEEE, 2015. [25] A. Salfinger, W. Retschitzegger, W. Schwinger, and B. Pröll, “Mining the disaster hotspots – situation-adaptive crowd knowledge extraction for crisis management,” in Cognitive Methods in Situation Awareness and Decision Support (CogSIMA), 2016 IEEE International InterDisciplinary Conference on, in press, 2016. MIPRO 2016/DC VIS Automatic protocol based intervention plan analysis in healthcare Miklos Kozlovszky1,2, Levente Kovács3, Khulan Batbayar1, Zoltán Garaguly1 Biotech Knowledge Center/Obuda University, Budapest, Hungary MTA SZTAKI/Laboratory of Parallel and Distributed Computing, Budapest, Hungary 3 Physiological Controls Group/Obuda University, Budapest, Hungary 1 2 {kozlovszky.miklos@nik, kovacs.levente@nik, khulan batbayar@biotech,garaguly.zoltan@biotech}.uni-obuda.hu, Abstract - Evidence and protocol based medicine decreases the complexity and in the same time also standardizes the healing process. Intervention descriptions moderately open for the public, and they differ more or less at every medical service provider. Normally patients are not much familiar about the steps of the intervention process. There is a certain need expressed by patients to view the whole healing process through intervention plans, thus they can prepare themselves in advance to the coming medical interventions. Intervention plan tracking is a game changer for practitioners too, so they can follow the clinical pathway of the patients, and can receive objective feedbacks from various sources about the impact of the services. Resource planning (with time, cost and other important parameters) and resource pre-allocation became feasible tasks in the healthcare sector. The evolution of consensus protocols developed by medical professionals and practitioners requires accurate measurement of the difference between plans and real world scenarios. To support these comparisons we have developed the Intervention Process Analyzer and Explorer software solution. This software solution enables practitioners and healthcare managers to review in an objective way the effectiveness of interventions targeted at health care professionals and aimed at improving the process of care and patient outcomes. Keywords health care intervention, process analyzer I. INTRODUCTION A growing demand can be seen in the healthcare sector for value-added, premium category medical services, which involves continuous medical monitoring and care. Another aspect within Europe is the increasing tendency of patient tourism and the ever growing number of medical services for foreigners at premium service providers. Interventions at a premium medical service provider has always well defined parameters (e.g.: cost, duration, etc.) and the whole intervention plan of the patients can be easily represented by timed graph structures. Intervention plan is part of the clinical pathway and in larger scale it is an important part of the patient’s life path. MIPRO 2016/DC VIS An intervention plan is basically builds up from generic consensus protocol(s), additionally it is always patient-centered thus it contains some patient related personalized parameter set as well. If we could compare the planned interventions (intervention plan) with the occurred interventions we could objectively assess the differences in medical services, analyze their impact, the effectiveness of the core consensus protocol, and the real usage scenarios of the underlying protocol. For such motivation we have created an Intervention Process Analyzer and Explorer software solution, which is automatically able to assess deviations from the intervention plan in an objective way. Our paper builds up as follows: In the first section we give an overview about our term definitions, in the next section we detail the predefined requirements of the solution and the identified data sources. This will be followed by an overview about the intervention plan analysis (about the analysis levels, and the used methods), then we provide a short descript about the internal architecture of the solution and finally summarize our results. II. STATE-OF-THE-ART In recent years Electronic Medical Record and medical billing process is merging together, and provides solid foundation for more effective tracking of patients, medical services and interventions. Patient Health Records (PHR) are evolving into digital format. HIMSS Analytics developed [13] the Electronic Medical Record Adoption Model (EMRAM) in 2005 as a methodology for evaluating the progress and impact of electronic medical record systems for hospitals and hospital ambulatory facilities. In this model eight stages (0-7) has been identified, that measures a hospital’s implementation and utilization of digital information technology applications. The Electronic Health Record (EHR) is basically a longitudinal electronic record of patient health information generated by one or more encounters in any care delivery setting. It contains information about patient demographics, progress notes, problems, medications, vital signs, past medical history, immunizations, laboratory data and various reports. 269 Electronic Health Records can be stored at medical service providers or accessed online by public [10, 11, 12]. Workflows are invading many medical fields. For example surgical workflows and their analysis is nowadays a hot topic, thus procedure modeling, workflow optimization, and skill assessment can provide more effective and precise procedures in the operating room (OR). Other medical workflows, treatment plans are usually accessible only as consensus protocols, however the real implementation of such protocols (orders, used materials, drugs, timelines, etc.) differs significantly from hospital to hospital. Some generic solutions are available both at the academic side (quantitative and qualitative workflow and EHR system evaluations), and at the commercial side (e.g.: the HealthCloud from Salesforce [8], and cloud based Practice Management systems). A reliable and robust coding scheme and naming convention system is inevitable to build up future proof intervention plan databases. Recently there are still problems in the healthcare sector: • coding schemes differs at country level (here we should note that most of the codes based on ICDx published by WHO [14]) • coding schemes are by default rigid structures, however due to the new medical interventions their scope (list) is always growing (WHO’s most recent version is ICD-10, ICD-10 was endorsed by the 43th World Health Assembly in May 1990 and came into use in WHO Member States as from 1994. ICD is currently under revision, and planned release date for ICD-11 is 2018. • new national level coding and naming convention scheme versions are in many cases not backward compatible , • different codes can be mapped to the same element. A. Used term definitions In the followings we describe shortly the most important basic terms for our paper defined/adapted from literature [1, 4, 5]: • Protocol: generic set of interventions (events and activities), and rules of a certain healthcare domain for a well defined group of patients, which was developed as a consensus agreement by a team of experts. It can have a constant evolution over time (can have version, can be expired after a well defined period of time), it based on the best practices of the healthcare professionals (and also on patient expectations) and the recorded patient statistics. It contains a set of events and activities with time and spatial constraints. Its internal structure can be represented as formal process description or a graph (protocol graph), which can contain iterations, conditional alternative paths. 270 • Intervention plan: Belongs to a single patient and contains events, activities, medical services with time, location, and resource parameters in a personalized manner. It can contain multiple protocols and also can contain complex control patterns (such as iterations, conditions, etc.) • Intervention plan graph: Visualization of the intervention plan as a directed graph. • Clinical pathway: Contains a set of events and activities with time and spatial constraints described in minimum one (personalized) intervention plan. • Realized interventions: Multisource dataset, which belong to a single patient, and contains event, activity and medical service logs, with time, location and patient data and various resource parameters. It is a clear reflection what was historically occurred with the patient during its clinical pathway. III. AIMS AND REQUIREMENTS Premium medical service providers are only viable, if they able to run their “business” with high efficiency. The market filters out all the medical service providers, which are not able to assess their potential and the quality of their services. These data can only collected with accountable, objective measurements, and continuous high resolution service monitoring. On the patient side new requirements appeared, such as pre-evaluation and interactive monitoring of health services and intervention modeling/virtualized service models with high accuracy. Our Intervention Process Analyzer and Explorer solution provides effective solution to do comparative analysis between the intervention plans and the occurred interventions, assess the difference and provide large scale statistics about the frequently used intervention scenarios. As identified requirements the realized solution should support combination with external healthcare databases and data mining applications to reveal more aspects of healthcare services, and their effectiveness on patient health. The analysis process should support offline running mode, totally independent from any interventions, without any intervention plan status restrictions (if the targeted intervention plan is not in a finished or closed state). IV. DATA SOURCES Our software solution collects information from various data sources, and stores these data in a semi structured data repository for further processing. The simplified system overview is shown in Figure 1. MIPRO 2016/DC VIS Figure 2. Intervention plan editor and visualizer GUI Figure 1. System overview The collected main data sets are the following: • Patient data o Patient base data o Patient survey data (satisfaction survey, etc.) o Anamnesis data • Healthcare service data • Historical logs received from the medical service provider or any healthcare systems • Protocol data • Intervention plan data • Realized intervention data (from the healthcare system) V. A. Analysis process Two types of analysis have been identified: simple analysis (individual intervention plan vs. realized intervention set of a single patient) and population scale analysis (complex, multi-parameter level intervention plan comparative analysis of a set of patients) shown in Figure 3.: • Simple analysis compares the intervention plan’s graph structure with almost 150 different, pre-defined performance indicators to the occurred interventions (and the measured parameters). We try not only to compare the parameters, but assess the impact of the difference • Population scale analysis can be a handy tool for practitioners and healthcare managers to create statistical analysis about protocol usage, average statistical parameters and expected patient outcomes. BUILDING UP AND ANALYSIS OF INTERVENTION PLANS On the intervention plan editor GUI (shown in Figure 2.) the intervention plan can be described as (an arbitrary complex) workflow, where the small circles are representing start/stop events, X represents alternative pathways with conditions, large boxes with labels and small icons are representing pre-defined intervention processes, and arcs define the direction of the path. Intervention plan analysis can be done from many viewpoints. In the GUI different user groups (healthcare manager, medical expert/practitioner) can do analysis both on individual, or on a large set of intervention plans. Figure 3. Intervention plan vs. realized intervention comparison B. Analysed parameters We have identified a large set of intervention task parameters (Pi), which support during evaluation to MIPRO 2016/DC VIS 271 objectively measure how the plans are matching the real world: • Sequence of the intervention steps • Amount of intervention steps • Intervention step sub-parameters • Conditional path decision accuracy • Iterations • Service resource consumption Logically, an intervention step parameter can be arbitrary complex, and can contain undefined number of subparameters in a recursive form. C. Used algorithms We are using different type of algorithms to do comparison between intervention plan and realized interventions. : • Boyer-Moore algorithm [6], which is basically a string searching algorithm. • Needleman-Wunsch algorithm [7], which is an algorithm used mainly in bioinformatics to align protein or nucleotide sequences using dynamic programming. It provides global sequence alignment with penalties. • Smith-Waterman algorithm [8], which is also a bioinformatics oriented matching algorithm. It provides local sequence alignment using penalties. D. Parameter difference evaluation After the analysis process we are evaluating the parameter value differences. We denote the P intervention process’s i-th parameter as Pi .We have defined impact score matrix and assigned an impact score (0<KPi<1) to each member of the parameter set. We are using a simplified linear evaluation function to calculate the impact of the differences (I). , (1) matrices can be used to analyze the intervention graphs from different viewpoints (medical service provider has easily different subjective impact values, than patients). In the current version of our solution only similar interventions process parameter types can be compared. This is caused by the fact, that we are not using any substitution matrix, to define mapping possibilities between the different intervention processes and their parameters. The evaluation results provide us a handy solution to compare and analyze intervention plans with the occurred intervention records. Similarities are helping to define real consensus intervention paths. The measurable differences are interesting investigation points where a premium medical service provider can gain vital information about sits service quality, and patients requirements. VI. SUMMARY The aim of our research was to define a medical intervention plan analyzer software solution, equipped with accurate intervention evaluation algorithms, combined with a user friendly graphical interface. With the designed and developed software solution we are able to compare arbitrary complex intervention graph structures with occurred interventions received as patient health records or intervention logs. Planned intervention task parameters are mapped to the occurred intervention parameters in an automatic way. We have defined the weight of each intervention parameters, and calculate an absolute distance of the two parameter values as the impact of the occurred differences From the parameter evaluation a lot of “hidden” information can be extracted such as information about the planning accuracy of the medical professionals, the difference between the implemented intervention plans and the official so called consensus intervention protocols or the correlation factors between the (really) occurred interventions and the patient outcomes, or the user satisfaction level and the occurred interventions. The developed framework solution is a generic intervention plan analyzer, however it can be used or adapted easily to a large set of medical domains. We have validated successfully the system within the premium diabetes and dental care medical service domains in Hungary. where Pi holds the planned and P’i is the really occurred intervention values. VII. , (2) simply calculates the weighted difference of the planned and occurred intervention tasks. Even if we can note here the subjective impact score value definitions. The larger I means larger distance between the plans and the real world scenarios. We can use the calculated values to search for alternative intervention graph paths or simply to optimize between the alternatives. Multiple impact score 272 ACKNOWLEDGMENTS The projects have been supported by the European Union. The authors would like to thank GOP-1.1.1-112012-0055- Dialogic „DIALOGIC – Mathematical model based decision support system for diabetes monitoring”, for its financial support. The authors would like to thank PPT Ltd. for their research support many advices and useful contributions. REFERENCES [1] Surján György, Borbás Ilona, Gődény Sándor, Juhász Judit, Mihalicza Péter, Pékli Márta, Kincses Gyula, Varga Eszter, Juhász MIPRO 2016/DC VIS [2] [3] [4] [5] [6] [7] Judit, Nagy Attila, Szabó Dóra, Vargáné Lőrincz Ildikó; Egészségtudományi fogalomtár, http://fogalomtar.eski.hu/index.php/Kezd%C5%91lap Carry M. Renders,, Edward H. Wagner,, Gerlof D. Valk,, Jacques Thm. Eijk Van,, Simon J. Griffin,, Willem J.J. Assendelft; Interventions to Improve the Management of Diabetes in Primary Care, Outpatient, and Community Settings, A systematic review; Diabetes Care, Volume 24, Number 10, pp.:1821-1833, October 2001 Dr. Vanessa Diaz, Marco Viceconti, Veli Stroetmann, Dipak Kalra et al. ;Radmal for the Digital Patient; DISCIPULUS project report, 2013 march A betegút szervezés módszertana és európai példái; HealthOnLine Hírlevél; 2012/7 – Különszám; 2012. augusztus 27. De Bleser L; Depreitere R; Waele K; Vanhaecht K; Vlayen J; Sermeus W (2006): Defining pathways. Journal of Nursing Management. 2006 october, 14. vol., 7.issue, pp.: 553-563. 11p. Boyer, Robert S.; Moore, J Strother (October 1977). "A Fast String Searching Algorithm.". Comm. ACM (New York, NY, USA: Association for Computing Machinery) 20 (10): 762–772. doi:10.1145/359842.359859. ISSN 0001-0782. Needleman, Saul B.; and Wunsch, Christian D. (1970). "A general method applicable to the search for similarities in the amino acid MIPRO 2016/DC VIS [8] [9] [10] [11] [12] [13] [14] sequence of two proteins". Journal of Molecular Biology 48 (3): 443–53. doi:10.1016/0022-2836(70)90057-4. PMID 5420325 HealthCloud from Salesforce: http://www.salesforce.com/industries/healthcare/healthcloud/?d=70130000002DuoI , accessed on 20. March. 2016 Mala Ramaiah, Eswaran Subrahmanian, Ram D Sriram, Bettijoyce B Lide; Workflow and Electronic Health Records in Small Medical Practices; Perspect Health Inf Manag. 2012 Spring; 9(Spring): 1d.Published online 2012 Apr 1.PMCID: PMC3329208 EHR Portal http://www.practicefusion.com/phr/ , accessed on 20. March. 2016 ChARM PHR Portal https://charmphr.com/login.sas , accessed on 20. March. 2016 IntelliChart Portal http://www.intelichart.com/ , accessed on 20. March. 2016 EMRAM http://www.himssanalytics.org/providersolutions#block-himss-general-himss-prov-sol-emram , accessed on 20. March. 2016 ICD-10, Classification of Diseases and Related Health Problems 10th Revision, Volume 2, Instruction manual, World Health Organization, 2010 Edition, ISBN 978 92 4 154834 2 (NLM classification: WB 15) 273 Using Fourier and Hartley Transform for Fast, Approximate Solution of Dense Linear Systems Željko Jeričević* and Ivica Kožar** * ** Department of Computer Engineering/Engineering Faculty, Rijeka, Croatia Department of Computer Modeling/Civil Engineering Faculty, Rijeka, Croatia zeljko.jericevic@riteh.hr Abstract - The solution of linear system of equations is one of the most common tasks in scientific computing. For a large dense systems that requires prohibitive number of operations of the order of magnitude n3, where n is the number of equations and also unknowns. We developed a novel numerical approach for finding an approximate solution of this problem based on Fourier or Hartley transform although any unitary, orthogonal transform which concentrates power in a small number of coefficients can be used. This is the strategy borrowed from digital signal processing where pruning off redundant information from spectra or filtering of selected information in frequency domain is the usual practice. The procedure is to transform the linear system along the columns and rows to the frequency domain, generating a transformed system. The least significant portions in the transformed system are deleted as the whole columns and rows, yielding a smaller, pruned system. The pruned system is solved in transform domain, yielding the approximate solution. The quality of approximate solution is compared against full system solution. Theoretical evaluation of the method relates the quality of approximation to the perturbation of eigenvalues of the residual matrix. Numerical experiments illustrating feasibility of the method and quality of the approximation, together with operations count are presented.. I. INTRODUCTION The solution of a system of linear equations is one of the basic methods in scientific computing. In the case of dense systems the order of magnitude N3 multiplications is required (N is dimension of a square matrix)... This paper extends the previous work [1],[2] in which we developed a framework for efficient linear least squares problems. We also supplement previous analysis with classical forward and backward error analysis [3]. Some equations developed previously [1] are briefly repeated here as final results without the details. The basic idea of constructing the approximate solutions for large, dense systems using the Fourier or Hartley space representation remains the same. 274 II. THEORY The Fourier transform is formally done by multiplication with Fourier matrix but in actual computations FFT is used whenever possible. b = Ac (1) Applying the Fourier transform on system of linear equations (1) is done along the columns. Premultiplying the (1) with Fourier matrix F and inserting F-1F=I after the matrix A, the transform along the column vector y and along the columns of matrix A, as well as the inverse Fourier along the rows of A and the Transform of solution vector c is performed (2). It is a good idea that before the transformation, the elements of vector b and corresponding row and column vectors in matrix A are sorted in such a way that the transform has close to the most compact representation. That would concentrate the energy in the Fourier transform in as little number of frequencies as possible. Fb = ( FAF −1 ) ( Fc ) (2) To avoid computations with complex numbers in the case of real systems the Hartley transform [4] can be used. This transformation of original system with the terms grouped as shown using the parenthesis will yield the Fourier or Hartley transform of vector c as a solution for system (2). III. METHOD After the transformation the system (2) is of the same size as original system but it could be pruned down by deleting rows and columns containing “insignificant” information, yielding the smaller system. The information is termed insignificant in the signal processing sense: the frequencies whose magnitude is a smallest percentage of total magnitude are discarded. The selection of significant frequencies is accomplished by computing the magnitudes for frequencies in vector b and sorting them MIPRO 2016/DC VIS in decreasing order. For columns, the sum of frequencies for each column is used for that purpose. Using that information, the transforms of vector b and matrix A can be shortened into significant parts to be retained and insignificant parts to be discarded. The solution of pruned system will yield the Fourier interpolation of solution vector c. This approach decreases the size of the system and represents the filtering out of non significant frequencies in order to build a smaller model system with a less equations and less unknowns then the original system [5]. In this respect our approach is different then Beylkin’s [6] whose idea was to increase the sparsity of the matrix by using the wavelet transform. Using residual matrix R and Taylor order expansion [1] it was shown that approximate inverse for system converges faster to exact equations inverse the smaller eigenvalues (λ) for (IR) matrix are (R is residual matrix). ( y = Fb y = Bx −1 p B ≈B B = FAF −1 R ≡ I − B p−1 B ⇒ B p−1 B = I − R B = Bp ( I − R ) B = ( I − R) B −1 −1 p δ x = ( I − R ) B0−1δ y −1 ( I − R) = ( QQ −1 − QΛQ ) −1 −1 −1 = ⎡⎣Q ( I − Λ ) Q −1 ⎤⎦ = Q ( I − Λ ) Q −1 ⎡ ⎤ ⎛ 1 ⎞ −1 Q diag Q ⎢ ⎥→I ⎜ ⎟ lim − λ 1 λ →0 ⎢ ⎥ i ⎝ ⎠ i n = 1,..., ⎣ ⎦ ( I − R) −1 −1 ( 3) = I + R + R 2 + R3 + ... Bn−1 ≡ ( I + R + ... + R n ) B p−1 b 2 (4) 2 (5) 2 2 Where index p denotes solution of the pruned system. TABLE I % freq ||c||2 Δf ||b||2 Δb 4.7x105 5 100 4.7x10 5 4 83 1.2x105 0.62 0.29 0.63 6.9x10-6 3 68 874 0.70 0.22 0.63 5.3x10-5 2 43 24 1.09 1.20 0.62 4.3x10-3 1 25 1 0.43 1.03 0.60 1.9x10-1 0.64 0.63 Where N is the dimension of the square system, second column defines how much of total frequency magnitude is used in constructing the solution, and Con.# is condition number of the matrix computed as a ratio of the largest and smallest singular value. Forward and backward errors are denoted Δf and Δb, respectively. Solutions for Hilbert system (vector c) are shown in Figure 1. and restored right hand side (vector b) in Figure 2. Solutions of the linear system value Where Bp is pruned approximation of B based on the most significant frequencies. Solving the model system will yield the approximate solution of original system (1). The approximate solution (x+δx) is different from true solution x for difference vector δx. It is important to show how difference vector behaves depending on size of matrix Bp. The (3) shows that R should be a contraction mapping: its eigenvalues should all be non-negative, and smaller then one. Smaller the eigenvalues are, closer (IR)-1 is to identity matrix as can be seen from the limit (4) and consequently closer Bp-1 is to B-1. The system of linear equations based on Hilbert matrix (an ill conditioned, but non-singular matrix) is used here to illustrate the change in condition number for matrix of decreasing size. The numerical experiments with forward and backward error analysis are summarized in Table I. Table I contains the results for rearranged Hilbert matrix. The rearrangement was done to reduce the Gibbs rigging Con.# 5 B∞−1 → B −1 MIPRO 2016/DC VIS c b − bp Δb = N −1 c − cp Δf = x = Fc ) −1 −1 due to difference between the first and the last elements in each column and row. The forward (4) and backward (5) error analysis are defined here: 0.8 0.6 0.4 0.2 0 -0.2 -0.4 -0.6 -0.8 1 2 3 4 5 index Figure 2. Exact solution (circle), N=4 (square), N=3 (diamond) N=2 (triangle up), N=1 (triangle down). 275 IV. Restored right hand side of Ac=b CONLUSION The approximate method for a fast solution of linear problems has been proposed. The quality of the approximation can be tested and controlled. We developed the equation which shows how the eigenvalues of residual matrix determine quality of solution and extent of corrections to be applied in order to approach the true solution. The proposed method offers more detailed assessment for the quality of solution then classical forward and backward error analysis. 0.7 0.6 0.5 0.4 0.3 0.2 0.1 1 2 3 index 4 5 REFERENCES Figure 2. Exact vector b (circle), N=4 (square), N=3 (diamond) N=2 (triangle up), N=1 (triangle down). [1] [2] Some interesting conclusions can be drown from the figures. The restoration of b is fairly good even for the worst of computed solutions. Consequently, the backward error analysis is too optimistic. The results of Forward error analysis are more realistic, but are not easy to compute, requiring the correct answer to evaluate an approximate one. Approach with eigenvalues (3) seems overly expensive, but to evaluate the convergence, only the highest eigenvalue is required. If it is less then 1, the iterative improvement will converge. 276 [3] [4] [5] [6] Jeričević Ž., Kožar I, "Faster Solution of Large, Over-Determined, Dense Linear Systems", DC VIS MIPRO (2013) 228-231. Jeričević Ž., Kožar I, "Theoretical and statistical evaluation for approximate solution of large, over-determined, dense linear systems", DC VIS MIPRO (2015) 227-229. O’Leary, D.P., “Scientific Computing With Case Studies”, 2009, SIAM, Philadelphia, pp 383 Bracewell, R.N., “The Hartley Transform”, Oxford University Press, New York, 1986 p 160. Jeričević, Ž., “Approximate Solution of Linear Systems” Croatica Chemica Acta, Vol. 78 (2005) 601-615 Beylkin, G., Coifman, R., and Rokhlin, V, “Fast Wavelet Transforms and Numerical Algorithms”, Com. Pure App Math. Vol. 44 (1991) 141–183 MIPRO 2016/DC VIS Procedural Generation of Mediterranean Environments N. Mikuličić* and Ž. Mihajlović* * University of Zagreb, Faculty of Electrical Engineering and Computing, Zagreb, Croatia niko.mikulicic@gmail.com, zeljka.mihajlovic@fer.hr Abstract - This paper describes an overall process of procedural generation of natural environments through terrain generation, texturing and scattering of terrain cover. Although described process can be used to create various types of environments, focus of this paper has been put on Mediterranean which is somewhat specific and has not yet received any attention in scientific papers. We present a novel technique for procedural texturing and scattering of terrain cover based on cascading input parameters. Input parameters can be used to scatter vegetation simply by slope and height of the terrain, but they can also be easily extended and combined to use more advanced parameters such as wind maps, moisture maps, per plant distribution maps etc. Additionally, we present a method for using a satellite image as an input parameter. Comparing results with real-life images shows that our approach can create plausible, visually appealing landscapes. Keywords: procedural modeling, landscape generation, virtual environments, natural environments I. INTRODUCTION Reproducing realistic environments has always been a challenge in computer graphics mainly due to a large number of data needed to be simultaneously processed and displayed on screen. With many optimization methods and ever advancing graphics hardware, realistic landscapes have become achievable in real time and are widely used in computer games, military simulators and film industry. Recent trends show that virtual landscapes have also found their usage in touristic promotions and since environmental research has become an important topic in the last years, it can be expected for virtual environments to have a major role in simulating how changes in different environmental parameters influence the surrounding environment. Manually creating a whole environment would be a long lasting, tedious task, therefore many procedural techniques have been developed to aid the process. Although procedural, these methods usually have many parameters and finding the right values is often a time-consuming trial and error process. In recent years, various methods for inverse procedural modeling have been introduced [7][22]. Rather than setting parameter values manually, these methods provide ways to learn them. Nevertheless, procedural modeling remains the most common approach to generation of environments. MIPRO 2016/DC VIS Authors discussing generation of natural virtual environments usually reproduce mountain meadows and dense forests. Although visually appealing, tall and dense trees efficiently hide everything behind them so many objects can be safely culled and popping effects can be easily hidden. In this paper we focus on natural Mediterranean environments which have not yet been discussed. Mediterranean environments often lack tall trees and instead have very dense shrubbery. Since shrubs often cannot hide the landscape behind, various optimization techniques have to be implemented with extra care for popping effect to be properly hidden and for the environment to be efficiently rendered in real time. Our generation process is being done in four steps: preparation, terrain mesh generation, texture mapping and scattering of terrain cover. In preparatory step, a user defines a set of textures and models to be used and assigns them input parameter values. The second step is used to create a terrain triangle mesh from heightmap which is followed by procedural generation of splat map and texturing of the terrain in the third step. The last step places the terrain cover using scattering algorithms to achieve natural randomness. Although divided into four steps, the user often returns to preparatory step to adjust parameter values and then repeats the process until he is satisfied with results. The whole process needs to be repeated only when user changes the terrain geometry. Otherwise, user can focus on just one step. Although the focus of this paper is on Mediterranean, the described process can also be used to reproduce any other natural environment simply by using different textures and models. After presenting related work in chapter two, chapter three continues by explaining techniques used to represent terrain, textures and terrain cover as well as the optimizations required for real time performance. Fourth chapter focuses on algorithms used for procedural texturing and scattering of terrain cover. We present our results in chapter five and give conclusion and some guidelines for future work in chapter six . II. RELATED WORK This work belongs to procedural modeling of natural environments, and as such, it connects various fields of computer graphics. 277 Procedural modeling automates the process of content creation by using set of rules and algorithms rather than creating content manually. It is often applied where manual creation of content would be too cumbersome a task. Procedural modeling has been successfully used to create various types of content some of which include textures [6], plants [10], buildings [12], cities [18], and terrains [8]. Terrain generation is often being done in two steps: generation of heightmap and creation of 3D mesh. Heightmaps are usually generated using fractal algorithms [1][11], noises [1][6] and erosions [1][13][17]. After creating a heightmap, terrain mesh is generated and optimized using level of detail algorithms. Static LOD algorithms reduce computational at the cost of memory demands and often need nontrivial and rigid preprocessing stage [23]. Continuous LODs are generated dynamically and are therefore more flexible and scalable but also computationally more demanding [15][21]. Forest representation includes representations of individual trees as well as the process of achieving their natural distribution in ecosystem. Trees have been represented with level of detail [3], billboards [20], parallel billboards [9], volumetric textures [16] and billboard clouds [5]. To create an ecosystem, trees are scattered using global-to-local approaches [4][14] that define some global rule by which the scattering occurs. Another approach is local-to-global [2][4][14] that models interaction between individual plants from which the global distribution arises. III. RENDERING Natural environment consists of several somewhat independent layers: terrain, textures and terrain cover. In order to achieve efficient rendering, optimization should be done in each layer. Terrain is usually represented with a heightmap which can be generated procedurally or downloaded from Internet in case real-world data is needed. Fig. 1 shows heightmap of Croatian island Ist (left) and heightmap generated using Perlin noise and Fractional Brownian motion (right). To generate terrain triangle mesh from heightmap we use Lindstrom-Koller simplification [15] on a single terrain region. Creating multiple regions with levels of detail is left for future work. Textures are applied to terrain by simple planar texture mapping. Splat mapping technique is used to provide more surface details close to observer while simple color mapping is applied to more distant areas where details are not visible anyway. Terrain cover is split into two subcategories: details and high cover. Details are small objects, like grass, visible only from short distances. Due to their size, many objects are needed to cover any portion of a terrain so they tend to use up most of the available computational resources. In order to achieve real-time rendering of large fields covered with grass, various optimization methods have to be implemented. Modeling each blade of grass independently would quickly reach a maximum of vertices and polygons 278 Figure 1. Heightmap of Mediterranean island Ist, Croatia (left) and procedurally generated heightmap (right) a graphics card can process in real time and for that reason, more efficient approaches have been developed. We represent multiple blades of grass with one axial billboard. Many billboards are needed to cover an area with grass and sending each billboard to GPU independently would mean too many draw calls and therefore, slower rendering. Instead, multiple billboards are grouped together and sent to GPU as point clouds in which every vertex represents the position of a single billboard. Billboard geometries are then created at runtime in geometry shader at positions defined by vertices in the given point cloud. Additionally, since details are visible only from short distance, all billboards with distance to observer greater than some user defined value are culled. Terrain split to regions would fasten the culling process, allowing large groups of billboards to be discarded with a single distance check. To avoid sudden popping effects, a transitional area is used in which a billboard goes from completely invisible to fully visible using alpha cutout technique. High cover includes all objects that can be viewed from greater distances like trees or larger rocks. To optimize the number of polygons in a scene, levels of detail have been used. Objects close to the observer are rendered using high quality models, while lower quality models are used on objects that are further away. To avoid popping effects while transitioning between different models of the same object, a cross-fade technique is used. Both models are rendered for some small amount of time, while one is slowly fading in and the other is fading out. For the lowest quality model of a tree we use a simple axial billboard. IV. PROCEDURAL GENERATION In this chapter we focus on procedurally transforming input parameters into texture weights and terrain object placement probabilities. Texture weights are used to create splat maps and color maps for terrain texturing while terrain object placement probabilities are used to generate terrain cover using scattering algorithms. A. Weights Calculation Input parameters are defined in preparatory step. For each input parameter, every texture and terrain object defines minimum and maximum values and a weight function to describe parametric area it resides in. For example, a texture can be given a slope input parameter. To define a parametric area it resides in, a texture has to specify minimum and maximum slope. Weight function is used to describe texture's weight or preference in area MIPRO 2016/DC VIS between minimum and maximum. Fig. 2 represents a common weight function although any curve can be used. Weight functions are defined in interval and scaled to fit specified range. Once we have defined parametric areas for all textures and terrain objects, their weights can be easily calculated. For texture or terrain object with parameters and , the weight is defined as: weight functions (1) is parameter where is the number of parameters and dependent function which maps terrain position to parameter value. It can represent calculating surface slope, elevation, sampling value from texture etc. Put simply, final weight is calculated by multiplying weights of individual parameter values at given position. B. Texturing Texture weights are calculated to give us information how much each texture belongs to specified area. This information can be used to procedurally texture a terrain. For terrain texturing we used splat mapping and color mapping techniques. Color mapping technique textures a whole terrain with just one texture. One pixel of color map usually covers not as small portion of terrain so this technique usually results with dull and blurry terrains when observed from close. It would require extremely large textures to create detailed surface using this technique which would be quite inefficient. To add more surface detail, splat mapping technique is used. Splat mapping uses multiple high detailed surface textures that are mapped to a small part of terrain and then tiled to fit the rest. One additional texture called splat map is mapped to whole terrain and used to provide information on how much each surface texture contributes to the final color of surface. Final color is determined dynamically in fragment shader by sampling all surface textures and multiplying sampled colors with their contributions sampled from splat map. Splat map can be directly generated from weights defined in previous chapter. One pixel in splat map corresponds to terrain area at position . Texture weights are then calculated using (1), scaled to sum of 1 and stored in a single splat map pixel, one channel per surface texture. This means that one splat map can store weights for four surface textures. We can always use a second splat map for four new surface textures, but that can become inefficient. Usually, four surface textures are more than enough. Color map is generated similarly. Instead of storing texture weights and calculating surface color dynamically in fragment shader, colors are sampled, mixed and baked into a color map. To avoid high frequencies in resulting texture, colors are not sampled from original surface textures but rather from their last mipmap. MIPRO 2016/DC VIS Figure 2. Weight function C. Scattering of Terrain Cover After procedurally texturing a terrain, next step is to provide it with grass and high cover. To achieve natural randomness and uniform distribution of grass, we use a regular grid with random displacement algorithm. Terrain is split into a regular grid with one grass billboard assigned to the center of each cell. Each billboard is then displaced in random direction by amount small enough to remain inside the given cell. For each billboard position , grass weights are calculated using (1). Grass type is then chosen using roulette wheel method where weight of grass type is proportional to associated probability. Potential collisions of grass on cell borders can be ignored as they visually do not make much difference. Collisions between trees, however, should be avoided. Trees can have different sizes and placing them in a regular grid would not be as easy as it was with grass. Therefore, for scattering trees we use simple random scattering with collision detection. We randomly choose a position, calculate tree weights and choose a tree sort using roulette wheel method. If chosen sort cannot fit onto position without colliding with surroundings, we remove the sort from consideration and try with another. The algorithm ends when all trees have been placed or the maximum number of iterations has been reached. In reality, forest is not just trees being randomly scattered. They are grown from seeds, fighting for space and resources. Stronger trees prevail giving birth to new ones and the process repeats. New seeds fall in the vicinity of mother tree which results in trees than are not just randomly scattered but rather grouped. To simulate this process we have implemented a survival algorithm with environmental feedback similar to algorithm described in [2]. At the beginning of each iteration, all plants generate seed at random positions inside seeding radius. Seed that falls outside of terrain or on a ground that is not inside specie's parametric area is removed. Next step is detecting collisions between plants. Plants in collision are fighting for survival and weaker plant, a plant with smaller viability [2], is eliminated. At the end of iteration, all plants increase in age and plants that have reached their maximum age are removed from the ecosystem. The algorithm ends after defined number of iterations. Fig. 4 shows comparison of these three scattering algorithms where different tree species are being represented by circles of different colors and radii. Grid displacement algorithm (left) gives good uniform distribution of objects of the same size. Random scattering with collision detection (middle) can successfully scatter objects of different sizes, however, it does not produce clusters of same species like survival algorithm (right) which gives forest a more natural look. 279 Figure 4. Scattering algorithms. Scattering on regular grid with random displacement (left), random scattering with collision detection (middle) and survival algorithm (right) D. Classifying Satellite Image For procedural texturing and scattering of terrain cover, we can also use a satellite image as an input parameter. To translate image into weights we use a simple, color based classification. To calculate weight at certain terrain position, satellite image is sampled and given color mapped to HSV space where boundaries between colors are more obvious. Fig. 3 shows a satellite image with its classification. Due to atmospheric scattering, images taken from great distances have their colors shifted towards blue. This makes classifying harder and more advanced terrain cover classifier would be needed to fully indentify the coastal line or tell difference between sea and vegetation. More advanced methods for classifying surface cover have been developed [19], however this approach is simple and gives good enough results in natural, uninhabited areas. After color sampled from satellite image is classified to surface cover type, we can treat that information as weight equal to if terrain object is of specified type, or if it isn't. V. RESULTS To evaluate our approach we will make comparisons in different scales between real and procedurally generated Mediterranean landscape. Fig. 5 shows Croatian island Ist textured using different input parameters. Fig. 5 (left) represents terrain textured using slope and height. This method is generic and usually gives good results on both real and procedural terrains. Fig. 5 (middle) represents the same landscape textured using terrain cover classification from satellite image. Terrain is textured more precisely using this method, however it can only be used on real-world Figure 3. Satellite image of Ist (left) and classification of terrain cover (right) landscapes. Additionally, procedurally generated terrain is never a perfect representation of real one due to inaccuracies in heightmap, interpolations and simplifications of terrain geometry. For that reason, coastal line is often misaligned as it can be seen on Fig. 5 (middle). Fig. 5 (right) tries to solve the coastal line problem using combination of previous two approaches. The coastal line height is defined and everything above that level is textured using terrain cover classification while everything below using only slope and height as input parameters. This makes coastal line more monotonous but consistent. Fig. 6 shows satellite image of smaller, uninhabited part of Ist and its procedural representation generated using already mentioned coastal line reconstruction approach. The technique proves to be more reliable in uninhabited areas with image taken from closer distance. In addition to textures, Fig. 6 (right) also contains trees scattered according to terrain cover classification. Following image (Fig. 7) shows the same terrain from the ground. To represent Mediterranean grass we used textures that are similar to following Mediterranean plants: Lactuca serriola, Trisetum flavescens, Conyza sumatrensis, Eupatorium cannabinum, Urtica dioica and Melilotus sp. Also, a model of Arbutus unedo has been used as a bush. To have more control over distribution of each species we included additional spread parameter for every grass and tree type. Finally, Fig. 8 displays a comparison between real and procedurally generated landscape. Every object on the image is procedurally scattered except for the two bushes in front which are placed manually for easier comparison. Figure 5. Comparison of procedural texturing by slope and height (left), by classifying satellite image (middle) and using coastal line reconstruction approach (right) 280 MIPRO 2016/DC VIS Figure 6. Satellite image of Dumboka cove at Ist (left), terrain cover classification (middle) and procedurally generated landscape (right) Figure 7. Procedurally generated Dumboka cove from ground Figure 8. Comparison between real (left) and procedurally generated landscape (right) MIPRO 2016/DC VIS 281 VI. CONCLUSION AND FUTURE WORK We presented a method for procedural generation of environments through terrain generation, texturing and scattering of terrain cover using cascaded input parameters. By calculating and multiplying weights of individual input parameters we obtain information about how much each texture or species wants to reside on given terrain position. That information can be used as texture's contribution to terrain color when texturing the terrain, or as probability when scattering the terrain cover. Height and slope of the terrain have proven to be reliable input parameters for generic uses. In case of real-world terrains, simple terrain cover classification from satellite image can be used to provide basic information for texturing and scattering of terrain cover. To fix coastal line issues caused by geometry misalignments between real and generated terrain, we introduced a coastal line reconstruction approach. Additionally, we used a spread parameter to have more control over distribution of each species. Comparing real life images with those generated procedurally, we believe that our method can create plausible and visually appealing Mediterranean landscapes. Many suggestions for future work can be made. Terrain should be split into regions for faster view frustum culling of terrain geometry, grass and trees. Level of detail algorithms should be used to dynamically reduce the polygon count of faraway terrain regions. Those regions could also use cheaper texturing technique, for example color mapping instead of splat mapping. Closer, high detail regions could use normal mapping or even fractal displacement of geometry to achieve 3D look of Mediterranean karst. Scattering terrain cover based on satellite image would benefit from more advanced terrain cover classifier. Although height and slope as input parameters give good generic results, they are not primary factors in shaping Mediterranean environment and it would be interesting to see how would additional parameters, like wind map and resistance to wind and salt, affect the final look of the environment. REFERENCES [1] [2] [3] [4] 282 T. Archer, "Procedurally generating terrain", 44th annual midwest instruction and computing symposium, Duluth, 2011., pp. 378393. B. Beneš, "A stable modeling of large plant ecosystems", In Proceedings of the International Conference on Computer Vision and Graphics, Association for Image Processing, 2002, pp. 94– 101. C. Colditz, L. Coconu, O. Deussen, C. Hege, "Real-time rendering of complex photorealistic landscapes using hybrid level-of-detail approaches", 6th Interational Conference for Information Technologies in Landscape Architecture, 2005. O. Deussen, P. Hanrahan, B. Lintermann, R. Měch, M. Pharr, and P. Prusinkiewicz, "Realistic modeling and rendering of plant ecosystems", In Computer Graphics (Proceedings of ACM SIGGRAPH) (1998), pp. 275–286. [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] X. Décoret, F. Durand, F. X. Sillion, J. Dorsey, "Billboard clouds for extreme model simplification", In ACM Transactions on Graphics (Proceedings of ACM SIGGRAPH) (2003), pp. 689– 696. D. S. Ebert, S. Worley, F. K. Musgrave, D. Peachey, and K. Perlin, Texturing & Modeling, a Procedural Approach, Elsevier, 3rd edition, 2003. A. Emilien, U. Vimont, M.-P. Cani, P. Poulin, and B. Beneš, "WorldBrush: Interactive example-based synthesis of procedural virtual worlds", ACM Transactions on Graphics (SIGGRAPH 2015), vol. 34, issue 4, pp. 11. J.-D. Génevaux, É. Galin, É. Guérin, A. Peytavie, and B. Beneš, "Terrain generation using procedural models based on hydrology", ACM Transactions on Graphics (SIGGRAPH 2013), vol. 32, issue 4, 143:1–13. A. Jakulin, "Interactive vegetation rendering with slicing and blending", In Eurographics 2000 (Short Presentations), 2000. R. Měch, and P. Prusinkiewicz, "Visual models of plants interacting with their environment", In Computer Graphics (Proceedings of ACM SIGGRAPH) (1996), pp. 397–410. G. S. P. Miller, "The definition and rendering of terrain maps", In Computer Graphics (Proceedings of ACM SIGGRAPH) (1986), vol. 20, pp. 39–48. P. Müller, P. Wonka, S. Haegler, A. Ulmer, and L. Van Gool, "Procedural modeling of buildings", In ACM Transactions on Graphics (SIGGRAPH, 2006) , vol. 25, issue 3, pp. 614–623. F. K. Musgrave, C. E. Kolb, and R. S. Mace, "The synthesis and rendering of eroded fractal terrains", In Computer Graphics (Proceedings of ACM SIGGRAPH) (1989), pp. 41–50. B. Lane and P. Prusinkiewicz, "Generating spatial distribution for multilevel models of plant communities", In Proceedings of Graphics Interface ’02., vol. 1, pp. 69–80. P. Lindstrom, D. Koller, W. Ribarsky, L. F. Hodges, N. Faust, and G. A. Turner, "Real-time, continuous level of detail rendering of height fields", In Computer Graphics (Proceedings of ACM SIGGRAPH) (1996), pp. 109–118. F. Neyret, "Modeling, animating and rendering complex scenes using volumetric textures", IEEE Transactions on Visualization and Computer Graphics, vol. 4, issue 1 (1998), pp. 55–70. J. Olsen, "Realtime procedural terrain generation", Technical Report, University of Southern Denmark, 2004. Y. I. H. Parish, and P. Müller, "Procedural modeling of cities", In Computer Graphics (Proceedings of ACM SIGGRAPH) (2001), pp. 301–308. S. Premoze, W. B. Thompson, and P. Shirley, "Geospecific rendering of alpine terrain", In Rendering Techniques (1999), pp. 107–118. J. Rohlf, J. Helman, "IRIS performer: A high performance multiprocessing toolkit for real-time 3D graphics", In Computer Graphics (Proceedings of ACM SIGGRAPH) (1994), pp. 381– 394. F. Strugar, "Continuous distance-dependent level of fetail for rendering heightmaps (CDLOD)", In Journal of Graphics, GPU, and Game Tools, 2009, vol. 14, issue 4, pp. 5774. O. Št’ava, S. Pirk, J. Kratt, B. Chen, R. Měch, O. Deussen, and B. Beneš, "Inverse procedural modelling of trees", Computer Graphics Forum, 2014, vol. 33, issue 6, pp. 118–131. T. Ulrich, "Rendering massive terrains using chunked level of detail control", SIGGRAPH Super-Size It! Scaling Up to Massive Virtual Worlds Course Notes, 2002. MIPRO 2016/DC VIS Energy-Aware Power Management of Virtualized Multi-core Servers through DVFS and CPU Consolidation Hamed Rostamzadeh Hajilari *, Mohammad Mehdi Talebi * and Mohsen Sharifi * *Iran University of Science and Technology/School of Computer Engineering,Tehran, Iran hamedrostamzade@comp.iust.ac.ir, mehdi_talebi@comp.iust.ac.ir, msharifi@iust.ac.ir Abstract— Considerable energy consumption of datacenters results in high service costs beside environmental pollutions. Therefore, energy saving of operating data centers received a lot of attention in recent years. In spite of the fact that modern multi-core architectures have presented both power management techniques, such as dynamic voltage and frequency scaling (DVFS), as well as per-core power gating (PCPG) and CPU consolidation techniques for energy saving, the joint deployment of these two features has been less exercised. Obviously, by using chip multiprocessors (CMPs), power management with consideration of multi-core chip and core count management techniques can offer more efficient energy consumption in environments operating large datacenters. In this paper, we focus on dynamic power management in virtualized multi-core server systems which are used in cloud-based systems. We propose an algorithm which is effectively equipped by power management techniques to select an efficient number of cores and frequency level in multi-core systems within an acceptable level of performance. The paper also reports an extensive set of experimental results found on a realistic multi-core server system setup by RUBiS benchmark and demonstrates energy saving up to 67% compared to baseline. Additionally it outperforms two existing consolidation algorithms in virtualized servers by 15% and 21%. Index Terms— CPU Consolidation, Energy Saving, DVFS, PCPG, Multi-core Servers. I. INTRODUCTION Due to large amount of energy consumed in today‟s servers, there is a growing need for energy-aware resource management. Large volume of energy consuming has implications on electricity costs causing many datacenters reporting millions of dollars for annual usage. 30$ billion is reported for enterprise power and cooling in the world annually. Additionally, a more recent report from the Natural Resources Defense Council (NRDC) claims waste and inefficiency in U.S. data centers that consumed a massive 91 billion kWh of electricity in 2013 will increase to 140 billion kWh by 2020, the equivalent of 50 large (500 megawatt) power plants. The amount of energy consumed by the world‟s data centers will triple in the next decade, putting an enormous strain on energy supplies and dealing a hefty blow to efforts to contain global warming, experts say. It is probably safe to say that most data center managers are dealing with the challenge of increasing data growth and limited IT resources and budgets. With “save everything forever” strategies becoming more prevalent for many organizations. Traditionally active servers were focused for energy efficiency. The former approaches aim at reducing the number of active servers in a datacenter by consolidating all the incoming workloads into fewer server machines whereas the latter attempts to keep the performance of each active MIPRO 2016/DC VIS server in accordance with the assigned workload so that energy can be saved at the level of each server. A data center is designed to provide the required performance based on its service level agreements (SLAs) with clients even during workload hours, and hence, its resources are vastly under-utilized at other times. As an example, the minimum and the maximum utilization of the statically provisioned capacity of Facebook‟s data center are 40% and 90%, respectively [1]. Therefore, a great amount of energy costs can be reduced by consolidating workloads into as few server machines as possible and turning off the unused machines. The server consolidation has been considered in many studies; and virtual machine migration (VMM) has been used as a means of server consolidation in many researches [2], [3], [4], [5] and [6]. Due to overheads and limitations associated with the server consolidation like high network latency, large system boot time, other approaches still can be concerned for energy saving. Because of these limitations, consolidation decisions are made in long periods of times. The longer the period is, the more server machines are still under-utilized which implies that there is still hoping to use other techniques to improve energy efficiency. Focusing on each active sever is another state-of-art technique. While focusing on each active server; two widely accepted and employed techniques are used which are sleep states and Dynamic Voltage and Frequency Scaling (DVFS). Obviously these new approaches can be used as a complementary with the previous techniques. DVFS was introduced decades ago and is one of the wellknown and common energy-aware CPU management techniques which has been studied in many researches [2] [7] [8]. DVFS has its own limitations. First the supply voltages have already become quite low and therefore a small amount of further supply voltage reductions is possible. In addition, typically in datacenter servers there are two or more processor chips containing multiple CPUs with a single onchip power distribution network, shared by all the CPUs, which makes it impossible for CPUs to operate in different supply voltage level and hence various clock frequencies. This limitation will result in under-utilized CPUs where the available performance level is higher than what is actually needed, hence energy is wasted. Another well-known CPU energy management technique is Dynamic Power State Switching. In Many modern processors, it is possible to turn on or off the cores or CPUs. The workload for each CPU can be measured by OS and it is possible to Turn CPUs on or off. In other words each CPU is placed in its own power domain and therefore the power to each such domain can be independently gated. You should notice that the suggestion from the OS may not be a good one that‟s why it may be cancelled by Power Control 283 Unit (PCU) which resides in the processor chip. PCPG is the deepest sleep state in modern multi-core processors which is capable to shut down a core completely. So a gated core has nearly zero energy consumption received a lot of attention. CPU consolidation is a system level technique to consolidate load to fewer active cores and the remained cores can be turned off. Traditionally, clock gating was used for the unloaded cores which also leads to energy saving. But, since modern processors support PCPG and CPU consolidation, they lead to significant energy saving. Obviously this approach beside DVFS can report better results. In this paper, our algorithm is equipped with both of the mentioned approaches. We suppose virtualized multi core servers that make our approach appropriate for cloud computing environments. In our approach we present a method to find the best choice of the number of active cores and appropriate frequency according to the related workload and then power gate remained cores. All of the results are obtained from the actual hardware measurements and not simulations. Experiments with implementation of our algorithm on a realistic multi-core server system setup have shown 67% percent improvement in energy consumption. The rest of the paper is organized as follows. Section II introduces related work on using PCPG and DVFS for energy efficient power management in datacenters. Section III describes problem definition for multicore processors of servers. In section IV our energy consumption model is presented and in section V the proposed energy aware mechanism based on energy model is described. We evaluate proposed mechanism compared to existing works in section VI. Finally in section VII conclusion and future works are presented. II. RELATED WORKS Despite various studies in power management of processors which exploit DVFS [2], [7], [9] and [10] rather than sleep states, recently presenting PCPG as deepest sleep state spread out the use of sleep states in datacenters [11] [12], [13], [14] and [15]. The Jacob Leverich and et al. research was the first one in this area which evaluates the use of PCPG in datacenters and demonstrated the importance of PCPG in power managements systems [12]. Their results show that PCPG solely can save energy up to 40% which is 30% more than using DVFS as a knob. They also show that the joint deployment of these two knobs together could save energy up to 60% in datacenters. As respect to considerable energy saving by PCPG, gating the power of cores has energy and performance overheads [16], [14] and [17]. Therefore, frequent mispredictions may result in inefficiency power management and power dissipation. Niti Madan et al. demonstrate that it is vital to attend PCPG by guarded mechanisms to hinder such negative consequences on intra-core level [17]. Reference [16] demonstrate that in addition to intra-core algorithms, inter-core algorithms also suffer from such negative outcomes and may lead to power dissipation, unlike expectations. Therefore, it is crucial to exploit guard mechanisms to prevent such negative outcomes. In [16] the authors use a guarded mechanism to decide when to gate the cores in both intra-core and inter-core gating algorithms. We also considered power and performance overheads of PCPG in our mechanism. Power overhead is considered in power model and due to the interval between the high threshold 284 and maximum capacity of active cores; there is always room to mitigate performance overhead. In the datacenters, due to power capacity overload and overheating caused by high server densities, system failures may occur. There is a need for robust and definite power managements to avoid system failures. As gated core doesn‟t consume energy, it has high potential to widely be used in power capping problems. Kai Ma et al. propose an integration design of using PCPG and DVFS/overclocking in such problems to not only satisfy power constraints but also to optimize performance [13]. They were trying to optimize the performance of a Chip Multiprocessor (CMP) within a given power constraint (i.e., power capping) but we are minimizing the energy consumption of CMPs in virtualized servers within an acceptable performance. In the datacenters experiencing low utilization, CPU consolidation technique equipped by PCPG present significant energy saving [18], [11] and [19]. With commercial hardware support of Core-level Power Gating, consolidation becomes a promising power management technique in datacenters [20]. In [1] the authors presented a technique called Core Count Management (some variant of the CPU consolidation technique) and reported 35% energy saving. However, they reported both power and performance results based on simulations performed by using simple power and performance models. Their work only uses PCPG but we propose a joint deployment of PCPG and DVFS. The most related works to ours are [11] and [18] which use joint techniques of DVFS and CPU consolidation together in virtualized environments. In spite of aforementioned researches, they implement a consolidation algorithm on a realistic multi-core server system setup. The authors first investigated effect of CPU consolidation on power dissipation and performance (latency) of such systems, and concluded by presenting two new CPU consolidation algorithms for virtualized multi-core servers. Their algorithms blindly consolidate CPUs based on the predicted utilization requirement, but, in our work we consolidate and choose frequency level of active cores based on their energy consumptions. III. PROBLEM DEFINITION In most of the todays power management researches, for the sake of simplicity in design due to different performance overhead of DVFS and PCPG, a decoupled approach is used in which the power management knobs operate independently. Considering these power management knobs independently have substantial drawback which leads inefficient simultaneous functions. In [19] this issue has been discussed that DVFS hinders PCPG for the workloads of multi thread applications like Fluidanimate from PARSEC benchmark. DVFS and PCPG are applied in different periods of time independently and due to lower overhead, DVFS is used in smaller periods and PCPC is used in longer periods in contrary. While workload is low, DVFS which is running in shorter periods of time decides to reduce the frequency to keep cores in higher utilization. But high utilization makes PCPG to keep the cores working and so none of them is turned off. Authors in [19] indicate that using DVFS beside PCPG not only doesn‟t improve energy saving but also results higher energy consumption than PCPG solely. Therefore there is a need for an approach using both of the techniques without hindering each other. MIPRO 2016/DC VIS Core 0 Core 1 Core 2 Core 3 Core 4 Core 5 60% 60% 60% OFF OFF OFF OFF OFF OFF OFF OFF OFF Core 6 Core 7 Core 8 Core 9 Core 10 Core 11 Core 0 Core 1 Core 2 Core 3 Core 4 Core 5 90% 90% OFF OFF OFF OFF 0.4 OFF OFF OFF OFF OFF OFF 0.2 Core 6 Core 7 Core 8 Core 9 Core 10 Core 11 1 P8:1.6GHz 0.8 P7:1.733GHz P6:1.867GHz 0.6 P5:2GHz P4:2.133GHz P3:2.267GHz P2:2.4GHz P1:2.533GHz P0:2.667GHz 0 a. Low number of active cores 0 20 40 60 80 100 Core 0 Core 1 Core 2 Core 3 Core 4 Core 5 60% 60% 60% 60% 60% 60% Fig2. Linear function for energy consumption respect to utilization 60% 60% 60% 60% 60% 60% Core 6 Core 7 Core 8 Core 9 Core 10 Core 11 Core 0 Core 1 Core 2 Core 3 Core 4 Core 5 65% 65% 65% 65% 65% 65% 65% 65% 65% 65% 65% OFF Core 6 Core 7 Core 8 Core 9 Core 10 Core 11 Our processor supports 9 P-states which P0 is the maximum performance state corresponding maximum frequency and P9 is lowest possible frequency that processor supports. You should notice that lower P-state results higher frequency and computation and in consequence higher energy consumption. The relations described in this section are derived from data collected in benchmarking experiments that CPU intensive workloads with different levels of CPU demand were applied to the servers under test. The power consumption of the servers was collected for all the operation frequencies. The data from the experiments suggest that for a fixed operating frequency, the power consumption of the server is approximately linear function of the utilization of the processor which is given by (1) where Ui is the utilization of the ith core and ai and bi are the constant value calculated from the collected data. b. High number of active cores Fig1. Effect of CPU consolidation on workload. CPU consolidation tries to resolve aforementioned issues based on utilization and frequency of cores. Now this question is raised that “Does consolidation always maximize energy saving?”. Our experimental result show that the answer is no. In low number of cores while turning off one core, the workload of the others is increased dramatically and makes them to work in higher frequency which results more energy consumption (Fig1.a). Conversely in High number of cores, reducing one of the active cores doesn‟t affect the workload of the others much, so there is no tangible change in the frequencies (Fig1.b). In this case turning off active cores is an effective approach. Our approach in this paper brings dynamicity to consolidation approaches to choose best number of active cores and frequency for minimizing energy consumption. IV. ENERGY CONSUMPTION MODEL In this section we present our approach for energy consumption model which the approach is based on. We first model the relation between core frequencies, number of active cores and energy consumption. We use presented model in our approach to set appropriate number of active cores and frequency in order to save energy. Obviously the frequency, performance state and energy consummation are tightly coupled elements of our model. In this part these relationships are calculated. In Fig.2 there are different performance states (P-State). Each P-State corresponds to a frequency supported by the processor. Processor_power = ai * Ui + bi , for each frequency I (1) In our approach constant parameter of linear relation between frequencies and utilizations are calculated beforehand. We use the parameters in our energy consumption model to select the number of cores and its performance state which minimize the energy consumption. In order to control the number of active cores, PCPG is used which is a circuit-level technique to cut off the power supply to a core. Obviously there are overheads for switching on and off a gated core. In [16] and [17] have discussed the overhead costs of the per core power gating, and demonstrate importance of guard mechanisms in using PCPG in order to ensure power savings. We also consider the overhead of the PCPG to achieve power saving by turning any core off. In this paper we use DVFS and PCPG simultaneously, so our approach must consider both of them. In each iteration we should decide about the number of active cores and related frequency, therefore energy consumption for target active cores and target frequency by considering the overhead for core gating should be calculated. Energy consumption is calculated by (2) which Ep is the consumed energy in the specified performance state and Eg is the overhead for turning cores on or off . (2) MIPRO 2016/DC VIS 285 Assume a server with „N‟ CPUs which all VMs have been consolidated to run on „n‟ active CPUs and that the remaining CPUs are power gated. The average CPU consumption of the ith CPU (Ei) is intrinsically related to the average CPU utilization (Ui) and the time interval. It can be modeled as follows: ∑ ∑ (3) Where the T is the control period. For the calculation of Eg if coretarget is bigger than corecurrent we have (4) otherwise Eg equals zero. ( ) (4) V. ENERGY AWARE MECHANISM In this section we present an energy aware algorithm to decide about the number of active cores and appropriate frequency simultaneously. So, due to assess required resources in different frequencies, there is need to find the relationship between utilization and frequency. In the case of a core, increasing the frequency will result decrease in utilization. In fact there is linear relationship between frequency and utilization which can be illustrated as: (5) Where Ui is the utilization of the core in frequency of Fi. Web servers are multi-thread applications which its load can be distributed to more CPUs. So, for the web servers we can extend (5) to (6). ∑ ∑ (6) Where Ci and Cj is the number of cores in state i and j, respectively. For the simplicity we assume all the cores work on the same frequency and having the same utilization. (7) Additionally as we demonstrated in the previous works [21] and [22], in virtualized servers there is contention between VMs in getting the physical resources. Our research demonstrated that for the ratio of vCPUs to physical CPUs bigger than a threshold, contention of VMs increase significantly. So, in this algorithm cminPossible is the minimum number of cores for preventing significant VM performance degradation due to resource contentions in an overbooked server. Input: N (total number of CPUs), fn (frequency), cn (the number active CPUs), Th (high threshold), and un (average utilization) Output: fn+1 (frequency), cn+1 (the number of active CPUs) 1: energy = max; 2: for core cn+1 in [cminPossible, cminPossible+1, … , N] 3: if Un+1(fmax) < Th then //need more core 4: for frequency f in all frequencies (ascending order) do 5: if un+1 > Th then //need more frequency 6: continue; 7: energyi = calculateEnergy(f, un+1, c) 8: if energyi < energy then 9: energy = energyi 10: fn+1 = f 11: cn+1 = core Fig3. Energy aware algorithm 286 In each iteration we assume that the amount of needed resources is the same with the previous cycle. In each cycle we choose the best number of active cores and specified frequency based on the workload amount in current iteration (Predicting the amount of consumed energy is out of the scope of this paper) in order to minimize E in (2). All of the possible states (frequency and number of active cores) are calculated and the best one is chosen. In each iteration, based on the predicted workload we can calculate the utilization of each core based on (7). The calculation is done by the presented energy model in previous section in which the overhead of changing the number of active cores in addition to energy consumption of active cores. You can find our algorithm on Fig 3. Un+1(fmax) represents utilization for the max frequency of current cores, if it is bigger than up threshold, then current cores cannot satisfy the workload, Hence we have to increase number of cores. We calculate all of the possible states based on the number of cores and possible p-states to minimize energy consumption. The method calculateEnergy calculates the energy consumption based on the selected frequency, load and the number of active cores and if the calculated energy is less than the previous states then the parameters are set. VI. EXPERIMENTAL RESULTS In this section, first we present our experimental testbed and benchmark which generates workload to evaluate the algorithm then three existing consolidation approaches are introduced and then the experimental testbed results are proposed. Next our algorithm is compared with the approaches which due to equal comparison conditions we also implemented them in Xen kernel. Finally, the physical testbed results are presented. Our server is equipped with two Intel Xeon X5650 processor and 8 GB RAM. Each one of the CPUs has 6 physical cores having 9 frequency levels: 2.667GHz, 2.533Gz, 2.4GHz, 2.267GHz, 2.133GHz, 2GHz, 1.867GHz, 1.733GHz and 1.6GHz. Our processors support PCPG as the deepest sleep state. Xen [23] is the widely used virtual machine manager in big companies like Amazon. We used Xen 4.4 latest available version as the hypervisor to host VMs. Our algorithm is implemented in Xen kernel which have direct access to resources for measurement and manipulating power management knobs. To avoid limitations of the network in experiments, we put the workload clients on the server, too. Each server hosts 4 virtual machines as clients and 4 web server virtual machine altogether with dom0. All the virtual machines run Ubuntu 14.04 as OS with 20GB of hard disk space. Each web server virtual machine has 2 vCPUs and 800 MB RAM but each client machine has 1 vCPU and 600 MB RAM. We chose RUBiS (Rice University Bidding System) [24], an auction website benchmark modeled after eBay to simulate real web servers. In RUBiS multiple clients which their number are defined beforehand, connect to the web server and simultaneously do selling, browsing and bidding operations. Each client starts a session and does aforementioned operations randomly after getting the response of each request from the servers. The benchmark starts with up ramp phase which it progressively increases the number of sessions to reach peak load due to not MIPRO 2016/DC VIS Table I. Energy efficiency and throughput results. Approaches Throughput % Energy improvement % Base 1 n/a Consolidation1 0.97 58 Consolidation2 0.94 55 Our algorithm 0.94 67 creating massive sudden load which is not tolerable by the server. In the steady state that is the second phase, the peak load will be maintained and finally in down ramp the load will be decreased to reach the end of the benchmark. Our approach is compared with three existing approaches. The first approach is the baseline which does not use any power management knobs. Both the frequency and number of on cores in server is maximum. The two other approach that use PCPG and DVFS simultaneously, are threshold based algorithms. In the first consolidation approach if the required resource is bigger than up threshold, it first checks that there is any gated core or not. If there is, the algorithm will bring the gated core up and otherwise, it will increase the frequency of the cores. For the required resources lower than down threshold, it firstly check lower frequency to satisfy the workload and finally if there is no lower frequency, it will turn the cores off to satisfy the workload [11]. All in all, this algorithm tries to keep the cores as lowest possible frequency. The second consolidation algorithm, conversely tries to keep the number of cores at the lowest possible value. In the case of need for more resources, first it tries to increase the frequency to meet the required resource and if the frequency is in the highest possible frequency, it will increase the number of cores. In the case of lower resource need than down threshold, it will try to decrease the number of cores first. Then if lower frequency could not be found, the number of cores will be decreased. Table I illustrates the energy efficiency and throughput of each test proportion to the baseline result. Throughput of each test is defined as the ratio of the requests to duration of the benchmark in the experiment. The first row illustrates results of the baseline approach defined beforehand. The second and third row are the experimental results of the consolidation algorithms described and the last one is ours. Even though blindly consolidation algorithms will have significant energy savings but there is still room for minimizing energy consumption. In Table I, since the baseline approach uses maximum resource to response the workload; we normalized values proportion to its results. Our algorithm reduces energy consumption up to 67% and 15% more than consolidation1 algorithm which is the best existing algorithm for consolidation. Note that although our algorithm has maximum energy saving but its throughput is less than consolidation1, because consolidation1 increases core and unfold more resources in facing growth in workload. Obviously we can mitigate the throughput degradation by exploiting accurate prediction mechanism but it is out of the paper scope. The reason why our algorithms and consolidation 2 throughputs are equal is that based on the energy consumption model, both the algorithms operate similar to each other in facing the increase in workload. However, in lowering workload they MIPRO 2016/DC VIS have different approach and our algorithm have 21% more energy saving respect to consolidate 2. VII. CONCLUSION AND FUTURE WORKS Due to limitations of server consolidation approaches, the servers also suffer from low utilizations of resources. Hence there is a need for an efficient energy management for power of processors. Modern chip multi-processors provide two options to reduce energy consumption: DVFS and PCPG which made CPU consolidation as an effective knob in power management of servers. Even though CPU consolidation hinders decoupled use of DVFS and PCPG but our experiments on modeling the CPU power consumption in different frequencies and different number of cores demonstrate that for low number of cores scaling out and operating in low frequencies there has more energy savings. Therefore, although blindly consolidation approaches significantly improve energy consumption but the energy saving is not maximized. Our proposed algorithm provide dynamically consolidate CPUs to minimize energy consumption. We also propose a novel energy model which considers energy overhead of PCPG, too. Finally, up to 67% energy saving without significant performance degradation is reported in our experimental results. Additionally the proposed algorithm outperforms two existing CPU consolidation algorithms by 15% and 21% in energy saving. In this paper we didn‟t consider any specific order for ON cores in our consolidation mechanism. Choosing the right order of active and inactive cores is planned as future work. Since higher density of ON cores results in higher temperature, more energy may be needed for cooling system, and hence different order of active cores is of a great importance. VIII. REFERENCES [1] O. Bilgir, M. Martonosi and Q. Wu, "Exploring the Potential of CMP Core Count Management on Data Ceter Energy Savings," in Workshop on Energy Efficient Design, 2011. [2] G. Dhiman, G. Marchetti and T. Rosing, "vGreen: a System for Energy Efficient Computing in Virtualized Environments," in Proceedings of the 2009 ACM/IEEE international symposium on Low power electronics and design, 2009. [3] R. Nathuji and K. Schwan, "VirtualPower: Coordinated Power Management in Virtualized Enterprise Systems," in ACM SIGOPS Operating Systems Review, 2007. [4] N. Bobroff, A. Kochut and K. Beaty, "Dynamic placement of Virtual Machines for Managing SLA Violations," in Integrated Network Management, 2007. IM'07. 10th IFIP/IEEE International Symposium on,, Munich, 2007. [5] N. Van, F. D. T. Hien and J.-M. Menaud, "Autonomic Virtual Resource Management for Service Hosting Platforms," in Proceedings of the 2009 ICSE Workshop on Software Engineering Challenges of Cloud Computing, 2009. [6] C. Clark, S. H. Keir Fraser, J. G. Hansen, E. Jul, C. Limpach, I. Pratt and A. Warfield., "Live Migration of Virtual Machines," in Proceedings of the 2nd conference on Symposium on Networked Systems Design & Implementation-Volume 2, 2005. [7] G. Von Laszewski, L. Wang, A. J. Younge and X. He, "Power-aware Scheduling of Virtual Machines in DVFS-enabled Clusters," in In 287 Cluster Computing and Workshops, 2009. CLUSTER'09. IEEE International Conference on, 2009. [8] P. Pillai and K. G. Shin, "Real-time Dynamic Voltage Scaling for Low-power Embedded Operating Systems," in ACM SIGOPS Operating Systems Review, 2001. [9] J. Chen, T. Wei and J. Liang., "State-Aware Dynamic Frequency Selection Scheme for Energy-Harvesting Real-Time Systems," Very Large Scale Integration (VLSI) Systems, IEEE Transactions on , vol. 22, no. 8, pp. 1679 - 1692, 2014. [10] Y. Wang, X. Wang, M. Chen and X. Zhu, "Power-Efficient Response Time Guarantees for Virtualized Enterprise Servers," in Real-Time Systems Symposium, 2008, Barcelona, 2008. [11] I. Hwang, T. Kam and M. Pedram, "A study of the Effectiveness of CPU Consolidation in a Virtualized Multi-Core Server System," in Proceedings of the 2012 ACM/IEEE international symposium on Low power electronics and design, 2012. [12] J. Leverich, M. Monchiero, V. Talwar, P. Ranganathan and C. Kozyrakis, "Power Management of Datacenter Workloads Using PerCore Power Gating," in Computer Architecture Letters, 2009. [13] K. a. X. W. Ma, "PGCapping Exploiting Power Gating for Power Capping and Core Lifetime Balancing in CMPs," in Proceedings of the 21st international conference on Parallel architectures and compilation techniques, 2012. [14] H. Jiang, M. Marek-Sadowska and S. R. Nassif., "Benefits and Costs of Power-Gating Technique," in Computer Design: VLSI in Computers and Processors, 2005. ICCD 2005. Proceedings. 2005 IEEE International Conference on, San Jose, CA, USA, 2005. [15] J. Lee and N. S. Kim, "Optimizing Throughput of Power- and Thermal-Constrained Multicore Processors Using DVFS and Per-Core Power-Gating," in Design Automation Conference, 2009. DAC '09. 46th ACM/IEEE, San Francisco, CA, 2009. [16] M. Annavaram, "A Case for Guarded Power Gating for Multi-Core Processors," in High Performance Computer Architecture (HPCA), 2011 IEEE 17th International Symposium on, San Antonio, TX, 2011. [17] N. Madan, A. Buyuktosunoglu, P. Bose and M. Annavaram, "Guarded Power Gating in a Multi-Core Setting," in Computer Architecture, 2010. [18] I. Hwang and M. Pedram, "CPU Consolidation versus Dynamic Voltage and Frequency Scaling in a Virtualized Multi-Core Server," in Very Large Scale Integration (VLSI) Systems, IEEE Transactions on, 2013. [19] A. Vega, A. Buyuktosunoglu, H. Hanson, P. Bose and S. Ramani., "Crank It Up or Dial It Down: Coordinated Multiprocessor Frequency and Folding Control," in Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture, 2013. [20] J. B. Leverich, "Future Scaling of Datacenter Power-efficiency," Stanford University, 2014. [21] M. Sharifi, H. Salimi and M. Najafzadeh., "Power-efficient distributed scheduling of virtual machines using workload-aware consolidation techniques," The Journal of Supercomputing, vol. 61, no. 1, pp. 46-66, 2012. [22] H. Salimi and M. Sharifi, "Batch scheduling of consolidated virtual machines based on their workload interference model," Future Generation Computer Systems, vol. 29, no. 8, pp. 2057-2066, 2013. [23] P. Barham, B. Dragovic, K. Fraser, S. Hand, T. Harris, A. Ho, R. Neugebauer, I. Pratt and A. Warfield, "Xen and the Art of Virtualization.," in ACM SIGOPS Operating Systems Review, 2003. [24] "http://rubis.ow2.org/," RUBiS: Rice University Bidding System. [Online]. 288 MIPRO 2016/DC VIS Human Posture Detection Based on Human Body Communication with Muti-carriers Modulation Wenshu Ni1, Yueming Gao2, Zeljka Lucev3, Sio Hang Pun4, Mario Cifrek5, Mang I Vai6 , Min Du7 Fujian Key Laboratory of Medical Instrument and Pharmacy Technology, Fuzhou University, Fuzhou,China 3,5 Faculty of Electrical Engineering and Computing, University of Zagreb, Zagreb, Croatia 1 595317255@qq.com, 2fzugym@163.com, 3zeljka.lucev@fer.hr 1,2,4,6,7 Abstract – Multi-node sensors for human posture detection, by acquiring kinematic parameters of the human body, helps to further study the laws of human motion. It can be used as reference for the quantitative analysis tool for some specific application, such as healthcare, sports training and military affairs, etc. Compared with the traditional optical method, posture detection based on the inertial sensors has smaller limitation of space, lower cost, and easier implementation. In this paper, a human posture detection system was introduced. Utilizing the parameter data obtained by the inertial sensors, three-dimensional angles of the human hand movement could be calculated via quaternion algorithm for data fusion. The angles data transmission among the sensor nodes was successfully realized by the human body communication(HBC) transceivers based on capacitive coupling of multi-carriers OOK modulation at the data rate of 57.6kbps. The bit error rate(BER) was less than 10-5. The human posture could be reconstructed on the PC host. Ultimately, the implementation of the overall result showed the feasibility of the system. Keywords - Inertial sensors, Human body communication, Multi-carriers OOK modulation, Posture reconstruction I. INTRODUCTION In recent years, human action recognition combined with specific applications such as virtual reality, health care, sports training, has caught much attention. Human posture is closely related to some disease characteristics. Detection will be further helpful for some specific condition as a quantitative tool. Physical therapy with quantitative approaches for geriatrics training or Parkinson’s patients for example[1]. Due to the improvements of Micro-Electro-Mechanical system (MEMS)[2][3], the size, accuracy, robustness and dynamic response are greatly improved. Compared with the traditional optical method, inertial sensors for posture detection has lower cost, less space requirement, and easier implementation. However, traditional means of communication among the inertial sensors are often based on Bluetooth, RF, Zigbee or WIFI[4], which have high power consumption and are likely to cause radiation and wireless signal aliasing. To reduce these impacts, the human body communication(HBC) has been introduced. HBC utilizes the human body as signal channel to transmit the data among the nodes. Capacitive coupling method of HBC transmits signals by generating an electric field so that the sensors attached on or MIPRO 2016/DC VIS implanted in the human body could share the information. Methods of HBC system are divided into two types. The first is the direct transmission of digital square wave signal without modulation[5][6], and the second is a modulation scheme with carrier frequency[7]. While the latter one is more conducive to achieve high data rate[8] and multichannel transmission[9]. In this paper we proposed a novel approach to the human posture detection system design. In order to save the channel resources and reduce the radiation impact, taking the direct measurement of the human body posture into account as well, capacitive coupling method of HBC with multi-carriers on-off keying(OOK) modulation was considered to achieve the two ways transmission of digital signals. The circuit of OOK modem has been proposed, which was capable to transmit the digital signal at the rate of 57.6kbps and the BER was less than 10-5. Besides, multi-carriers modulation method was adopted to realize the two channels parallel transmission in real time. II. SYSTEM DESIGN Body posture detection system can detect the precise angle information of human action, and play an important role as a quantitative tool for some specific application. In this paper, a human gesture detection system design was proposed. Three inertial sensors was capitalized on to collect and process information of the posture angle. HBC was responsible for the two ways sensor data transmission among the sensors at the data rate of 57.6kbps. Eventually, the reconstruction operation was implemented on the PC. Fig.1 Diagram of the proposed human posture detection system The structure of the system can be showed in Fig. 1, it comprises of: (1) Inertial sensor nodes, (2)HBC 289 device, (3)Reconstruction of body posture. To begin with, the inertial sensor node, which consists of accelerometer, gyroscope and magnetometer, will detect the kinematics data of the body. Fig.1 shows the structure of the sensor. In this paper, we had three sensors put on the up arm, lower arm and hand respectively to get the kinetic information of the three parts. Then algorithm of data fusion will figure out the three-dimensional angles of three parts in the free space. The capacitive coupling HBC was used for wireless transmission. Only the signal electrodes of TX and RX are attached to the body while the ground (GND) electrodes are left floating. The TX establishes the quasi-static electric field, while the RX detects the electric potential at the remote end in this application[10], therefore, the information could be shared between TX and RX, which performs less attenuation than the galvanic coupling. As shown in the Fig.2. In this mechanism, the node 1 got the upper arm posture data and then transmitted it with carrier 1 modulation. The node 2 received the data via the HBC receiver 1 and transmitted the data of node 2 and node 1with carrier 2 frequency modulation. Finally, the node 3 received former two channel signal via HBC receiver 2, and then the received signals together with the data of node 3 would be used to reconstruct the posture on the PC. Therefore a tree structure was formed to detect the arm and hand posture. The transceivers position setup on the human body is as Fig.5. Fig.2 The HBC mechanism III. METHOD A. Calculation of Angles In this system, inertial sensor module MPU6050, was adopted considering its convenience in wearable property. With sensor attached, the arm movement for example can be regarded as the coordinate change of the sensor model. Therefore the three-dimensional angles of arm is equivalent to the module’s, which could be expressed by quternion as shown in (1). Equation (2) is the carrier’s transformation matrix. With the data obtained by the accelerometer, gyroscope and magnetometer, the transformation matrix could be corrected using error vector products. Then the quaternion is refreshed by the numerical integration 290 method, and the RungeKutta method is described as (3). Finally, due to the transformation of quaternion being equivalent to the Euler’s, the Roll, Pitch and Yaw angles can be calculated eventually, following the (4)-(6).   Q  q0  q1i0  q2 j0  q3k0  cos (li0  mj0 nk0)sin , 2 2 1 2(q22  q32 ) 2(q1q2  q0q3 ) 2(q1q3  q0q2 )   CbR (q)  2(q1q2  q0q3 ) 1 2(q12  q32 ) 2(q2q3  q0q1 ), 2(q1q3  q0q2 ) 2(q2q3  q0q1 ) 1 2(q12  q22 )    .    0  g x  g y  g z   q0   q.0  q  1 gx 0 g z  g y   q1   ,  .1    0 g x   q2   q2  2  g y  g z     .  gy  gx 0   q3   g z  q3  Yaw  arctan  2q q  2q q  sin   arctan  1 2 2 0 2 3 , cos   1  2 ( q 2  q3 )  (1) (2) 3 ( 4) Pitch  arcsin( 2 q1 q 3  2 q 0 q 2 ), (5 )  2q q  2q q  Roll  -arctan  2 3 2 0 2 1 ,  1  2(q1  q2 )  ( 6) B. Human body communication of OOK multi-carriers modulation In order to transmit multiple signals, we use a multi-carrier OOK modulation method based on frequency division multiplexing principle. As shown in the Fig.3, the total bandwidth of the human body channel was divided into many sub-bands(or subchannels) to transmit multiple signals with different carrier frequency modulation, while the receiver end would separate the channels, so that multi-channel parallel transmission of signals without interference could be possible. The energy of electromagnetic wave is mainly in the form of electric field when its frequency is below 10MHz in the capacitive coupling of HBC, which performs less radiation and better transmission characteristics[11]. In this system, we use 6MHz, 9MHz as carriers frequency. The narrow band pass filters with center frequency of the carriers separate the two channel signals and removed the unwanted noise. Fig.3 Multi-carriers modulation of HBC MIPRO 2016/DC VIS HBC on-off keying modulation scheme is shown in Fig.4. The HBC transceiver realizes the OOK modulation by sinusoidal oscillator circuit and analog switch. We produce the frequency of 9 MHz or 6MHz 3V amplitude sine wave. The digital signal is input to the TXIN. When the TXIN is '1', frequency sine wave is output, and when the TXIN is '0', no signal is output. Thus the OOK modulation is realized. In the receiver, signals suffered attenuation and distortion. Chebyshev passive band pass filter has been employed because of its fast attenuation of the transition band to remove the unwanted signals. The center frequency of the filter is carrier frequency. Then the envelop detector together with the comparator could demodulate the signal and control the electrical level in the range of TTL level, making sure the MCU of the receiver could get the digital signal via the serial port directly. result. The electrodes of transceivers were attached on the human arm and hand back as described in session II. The data was modulated and transmitted though the human body to the receiver board. The test signal was NRZ type generated by a STM32 serial port with a rate of 57.6kbps. The measured output waveform of recovered data could be seen clearly on the screen of the oscilloscope, showing that the data receivers recovered the modulation signals and separated the two channel signals successfully. The Fig.6 shows the output of the modulation signal and the received signal of transceiver 1 whose carrier frequency is 6MHz. After analysis, the bit error bit was less than 10-5 when the data rate was 56.7kbps, indicating the HBC system could work successfully. Fig.6 Modulation and demodulation wave form compared with original signal Fig.4 The OOK modulation scheme IV. RESULT AND DISCUSSIONS A. Experiment of HBC In the measurement experiment, all of the circuits board were powered by battery in order to separate the transmitter node and the receiver node; the Textronix2024 oscilloscope was used in the experiment, which is self-powered by its own battery inside, could keep apart two passageways in the round as well. A 24-years old male’s left arm was used as experimental subject. The medical electrode’s size was 4cm×4cm. Fig.5 HBC devices experiment B. Reconstruction of the posture of human arm and hand As shown in Fig.7, three inertial sensors are respectively put in the upper, lower arm and the hand back to obtain the attitude angles of the three parts. As a result, the three dimensional angles could be acquired by the computer. The result of reconstruction could be seen in Fig.7, there are three cylinders regarded as the human arm and hand, the red part is the upper arm, and the yellow one is the lower arm, while the purple one is the hand. The result showed that the system based on the inertial sensor was able to measure and reconstruct the human posture. Fig. 7 The reconstruction of human posture Fig.5 shows the measurement setup and the wave MIPRO 2016/DC VIS 291 V. CONCLUSION In this paper, a human posture detection system based on inertial sensors and human body communication was proposed and implemented. Using algorithm of quaternion to calculate the three-dimensional angles of the human arm and hand, the posture information could be obtained. The human body was selected as propagation medium for two channel data transmission. Capacitive coupling method of HBC was considered, besides, the frequency of 6MHz and 9MHz were chosen as multi-carriers frequency in HBC devices of OOK modulation. The transceivers were implemented and the BER was less than 10-5 when the data rate was 57.6kbps as system designed. The arm and hand posture could be reconstructed on computer so that the results are visible and quantitative for some specific application. [11] Joonsung Bae, Hyunwoo Cho, Kiseok Song, Hyungwoo Lee and Hoi-Jun Yoo. The Signal Transmission Mechanism on the Surface of Human Body for Body Channel Communication, IEEE Transactions on Microwave Theory and Techniques, 2012, 60(3): 582-593. ACKNOWLEDGMENT The authors would like thank to the Ministry of Science Foundation of China 2013DFG32530, the National Natural Science Foundation of China 61201397 and the Funds of the Department of Education of Fujian Province, China, JA13027. REFERENCES Fay B. Horak, PhD, PT, and Martina Mancini, PhD. Objective Biomarkers of Balance and Gait for Parkinson’s Disease Using Body-worn Sensors. BALANCE AND GAIT BIOMARKERS, Vol.38, NO.11, pp.1544-1551,2013. [2] N. Barbour and G. Schmidt, “Inertial sensor technology trends,” IEEE Sensor J., vol. 1, no. 4, pp. 332–339, Dec. 2001. [3] J. Barton, A. Lynch, S. Bellis, B. O’Flynn, F. Murphy, K. Delancy, and S. C. O’Mathuna, “Miniaturized inertial measurement units (IMU)for wireless sensor networks and novel display interfaces,” in Proc. Electronic Components and Technol. Conf., 2005, pp. 1402–1406. [4] Yao-Chiang Kan, Chun-Kai Chen. “A Wearable Inertial Sensor Node for Body Motion Analysis.” IEEE SENSORS JOURNAL, VOL. 12, NO. 3, MARCH 2012. [5] C. H. Hyoung, J. B. Sung, J. H. Hwang, J. K. Kim, D. G. Park, S. W.Kang, “A novel system for intrabody communication: Touch-And-Play,” Proc. ISCAS, pp. 1343–1346, May.21-23, 2006. [6] Linlin Zhang, Yueming Gao. “Design of Human Motion Detection Based on the Human Body Communication,” In Proc, IEEE TENCON 2014, Macau SAR, China. Nov. 2015. pp. 1-4. [7] K. Hachisuka, A. Nakata, T. Takeda, Y. Terauchi, K. Shiba, K. Sasaki,H. Hosaka, and K. Itao, “Development and performance analysis of an intra-body communication device,” in Proc. 12th IEEE Int. Conf. Solid-State Sens., Actuators, Microsyst., pp. 1722–1725, Jun, 2003. [8] T. Leng, Z. Nie, W. Wang, F. Guan, and L. Wang, "A human body communication transceiver based on on-off keying modulation," In Proc., IEEE isbb 2011, China, Nov. 2011, pp. 61-64. [9] Ž. Lčev, I. Krois, and M. Cifrek, "A multichannel wireless EMG measurement system based on intrabody communication," XIX IMEKO World Congress, Fundamental and Applied Metrology, Lisbon, Portugal, pp. 1711-1715, September 2009. [10] Ruoyu Xu, Hongjie Zhu. Electric-Field Intrabody Communication Channel Modeling With Finite-Element Method. IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, VOL. 58, NO. 3, MARCH 2011 [1] 292 MIPRO 2016/DC VIS SAT-based Search for Systems of Diagonal Latin Squares in Volunteer Computing Project SAT@home Oleg Zaikin, Stepan Kochemazov, Alexander Semenov Matrosov Institute for System Dynamics and Control Theory SB RAS, Irkutsk, Russia Email: zaikin.icc@gmail.com, veinamond@gmail.com, biclop.rambler@yandex.ru Abstract—In this paper we considered the problem of finding pairs of mutually orthogonal diagonal Latin squares of order 10. First we reduced it to Boolean satisfiability problem. The obtained instance is very hard, therefore we decomposed it into a family of subproblems. To solve the latter we used the volunteer computing project SAT@home. In the course of 10-month long computational experiment we managed to find 29 pairs of described kind, that are different from already known pairs. Also we considered the problem of search for triples of diagonal Latin squares of order 10 that satisfy weakened orthogonality condition. Using diagonal Latin squares from the known pairs (the most of them were found in SAT@home) we constructed new triples of proposed kind. During this computational experiment we used a computing cluster. I. I NTRODUCTION The combinatorial problems related to Latin squares pose interest for mathematicians since Leonard Euler. A lot of general information about these problems can be found in [1]. Latin square of order n is a square n × n table filled with elements from some set M ,|M | = n in such a way that each element from M appears in each row and each column exactly once. Initially Leonard Euler used the set of Latin letters as M , therefore the corresponding combinatorial designs were named Latin squares. In this paper for convenience we will use as M the set {0, . . . , n − 1}. The Latin square is called diagonal if both its primary and secondary diagonals contain all numbers from 0 to n − 1. In other words, the constraint on the uniqueness is extended from rows and columns to two diagonals. A pair of Latin squares of the same order is called orthogonal if all ordered pairs of the kind (a, b) are different, where a is the number in some cell of the first Latin square and b is the number from the same cell in the second Latin square. If there are m different orthogonal Latin squares, from which each pair is orthogonal, then it is called the system of m mutually orthogonal Latin squares (MOLS). One of the most well-known unsolved problems in this area is the following: to answer the question whether there exists a triple of MOLS of order 10. There are many different approaches to solving problems regarding combinatorial designs. In the present paper we use the so-called SAT approach [2]. Basically, it means that we reduce the original problem to Boolean satisfiability problem (SAT) and then apply state-of-the art SAT solvers to MIPRO 2016/DC VIS an obtained instance. The attractiveness of this approach is justified by the fact that a lot of problems from different areas (for example, software verification, cryptography or bioinformatics) can be effectively reduced to SAT. Despite the fact that SAT is NP-hard, and all known algorithms for solving SAT are exponential in the worst case scenario, stateof-the-art heuristic algorithms manage to solve SAT instances encoding practical problems from various areas in reasonable time. The majority of such fast SAT solvers are based on the Conflict-Driven Clause Learning paradigm (CDCL) [3]. Evidently, among SAT instances there are very hard variants and to solve them in reasonable time it is necessary to involve significant amounts of computational resources. That is why the improvement of the effectiveness of SAT solving algorithms, including the development of algorithms that are able to work in parallel and distributed computing environments, is a very important direction of research. In 2011 we launched the volunteer computing project SAT@home aimed at solving hard SAT instances [4]. One of the aims of the project is to find new combinatorial designs based on the systems of orthogonal Latin squares. Let us briefly outline the paper. In the second section we describe the volunteer computing project SAT@home. In the third section we detail how we apply SAT approach to finding pairs of mutually orthogonal diagonal Latin squares (MODLS) of order 10 in SAT@home. In the fourth section we show how the pairs of MODLS found in SAT@home can be used to search for triples of diagonal Latin squares of order 10 that satisfy weakened orthogonality condition and discuss the results of our computational experiment, performed using a computing cluster. II. VOLUNTEER C OMPUTING P ROJECT SAT@ HOME Volunteer computing [5] is one of the types of distributed computing. Its defining characteristic is that it uses computational resources of volunteers PCs. The majority of volunteers here are usual people, not affiliated with a single organization or group of companies. Usually, one volunteer computing project is designed to solve one or several closely related hard problems. It is important to note that when volunteers PC is connected to the project, all the calculations on it are performed automatically and do not inconvenience user since for this purpose only idle resources of PC are employed. 293 Another distinctive feature of volunteer computing projects is that they can use only embarrassing parallelism [6], i.e. the original problem should be decomposed into a family of subproblems that can be solved independently from each other. Volunteer computing is very cheap — to maintain a project one needs only a dedicated server and several client applications fitting into one infrastructure. Empirically, the main difficulty here lies in software development and database administration. Also it is crucial to provide a feedback to project users via web site and special forums. An attractive consequence of such structure is that volunteer project can spend several months or even years to solve one hard instance. The majority of currently functioning volunteer computing projects are based on the Berkeley Open Infrastructure for Network Computing (BOINC) [5], developed in Berkeley in 2002. Overall, at the present moment there are about 70 active volunteer projects. Total performance of these projects exceeds 11 PFLOPs. Structurally, volunteer computing project can be considered as a sum of the following parts: server daemons, database, web site and client applications. Here daemons include work generator (it generates tasks to be processed by volunteers PSs), validator (that checks the correctness of the results received from volunteers PCs) and assimilator (it processes correct results). The client applications should have versions for the widespread computing platforms. In 2011 the authors of the paper in collaboration with colleagues from IITP RAS developed the volunteer computing project SAT@home [4]. On February 7, 2012 this project was added to the official list of BOINC projects1 with alpha status. In 2015 the status was upgraded to beta. The main goal of the project is to solve hard SAT instances from various subject areas. SAT@home is based on the BOINC platform. Currently (as of January 22, 2016) the project has about 3784 active PCs from users all over the world, and has the average performance of 8.4 TFLOPs. Let us consider the basic features of the SAT@home project in more detail. The work generator daemon is located on the server. It decomposes the original SAT instance into a family of subproblems. For this purpose it uses decomposition parameters, found via the special Monte Carlo method [7] on a computing cluster. On this stage it is necessary to use a computing cluster because the corresponding computational scheme uses a lot of interprocessor exchange (essentially, it employs fine-grained parallelism). According to the concept of redundant calculations used in BOINC, the work generator creates 2 copies of each task to be processed by 2 users from different teams (it decreases the possibility of cheating). The validator daemon checks if both copies of a single task yielded the same result and then gives verified results to the assimilator daemon. If there was found satisfying assignment then the assimilator checks it, and if it is correct, then it marks the original SAT instance as solved. The client application in SAT@home is based on the slightly modified CDCL solver M INI S AT [8]. 1 http://boinc.berkeley.edu/projects.php 294 In May, 2012, the six-month computational experiment, aimed at solving 10 instances of A5/1 keystream generator cryptanalysis, successfully ended in SAT@home [7]. Note that in this experiment there were considered only instances, that could not be solved via known Rainbow tables. For each instance the goal was to find unknown initial values of generator registers. In 2014 in SAT@home there were successfully solved five weakened cryptanalysis instances for the Bivium keystream generator [9]. In each instance the values of 9 out of 177 bits corresponding to initial values of generator registers (at the end of initialization phase) were known to make them solvable in reasonable time. In general, the SAT@home project is a powerful tool for solving hard SAT instances. In the next sections we will describe how we apply it to the search for combinatorial designs. III. F INDING PAIRS OF M UTUALLY O RTHOGONAL D IAGONAL L ATIN S QUARES OF O RDER 10 IN SAT@ HOME The existence of a pair of MODLS of order 10 was proved in 1992 — in the paper [10] three such pairs were presented. In 2012 we started in SAT@home the computational experiment aimed at finding new pairs of MODLS of order 10. The experiment was finished in 2013 and its results were published in [11]. First we constructed a propositional encoding for this problem. The obtained CNF had 2000 Boolean variables and 434440 clauses. The size of the corresponding Conjunctive normal form (CNF) in the DIMACS format was 10 Mb. We used the so-called “naive” encoding (for example, see [12]). We decomposed the obtained SAT instance as follows. The first row of the first diagonal Latin square was fixed to be equal to “0 1 2 3 4 5 6 7 8 9”. It does not lead to the loss of generality because of the properties of Latin squares. After this we processed all possible values of the first 8 cells of the second and the third rows. As a result we obtained a family of subproblems, in which for each subproblem for the first diagonal Latin square the values were fixed for 26 out of 100 cells (10 from the first row and 8 from the second and the third rows). In terms of CNF it translated into assigning values to 260 out of 2000 variables in each SAT instance. In SAT@home experiment each job batch contained 20 such SAT instances. For each instance M INISAT had a limit of 2600 restarts that is more or less equal to 4 minutes on one core of state-of-theart CPU. To process 20 million subproblems generated for the experiment it took about 9 months of work of the SAT@home project (from September 2012 to May 2013). As a result we found 17 new pairs of MODLS of order 10 (in addition to three previously known pairs from [10]). In April 2015 we started in SAT@home another experiment on the search for new pairs of MODLS of order 10. Note that in the experiment held in 2012 (see [11]) we used the decomposition chosen according to some rational considerations. In the new experiment we decided to use a more systematic approach to choosing the decomposition. In Table 1 we show the data for 7 variants of decomposition (including one suggested in [11]) and the results obtained for them. In MIPRO 2016/DC VIS TABLE I T HE RESULTS OF EXPERIMENTS FOR DIFFERENT DECOMPOSITIONS . Decomposition Solutions found Time 1 row, 9 cells 1 1 month in 2015 2 rows, 2 cells each - 1 day in 2015 2 rows, 3 cells each - 3 days in 2015 2 rows, 4 cells each - 2 weeks in 2015 2 rows, 5 cells each 23 4 months in 2015-2016 2 rows, 6 cells each 5 3 months in 2015-2016 2 rows, 8 cells each 17 9 months in 2012-2013 total from April 17, 2015 to February 7, 2016 we managed to find 29 new previously unknown pairs of the considered kind (compared to 3 pairs from [10] and 17 pairs from [11]). All found solutions are available online at the web site of the SAT@home project. In this experiment we used the same client application with the same limit on the amount of restarts as in 2012. In all decompositions we considered first cells from the left (it means that since we always fix the first row, then we vary the rows starting from the second). Let us comment the Table 1. The decomposition “2 rows, 8 cells each” corresponds to the decomposition from [11]. All the other 6 decompositions have been tested in 2015-2016. The decomposition “2 rows, 2 cells each”, “2 rows, 3 cells each” and “2 rows, 4 cells each” did not yield any results under the experiment rules. The decomposition “2 rows, 5 cells each” turned out to be more effective than “2 rows, 8 cells each” used in [11], because it made it possible to find more new pairs per time unit (even if we take into account that the performance of the SAT@home project in 2015 was twice that in 2012). It also was the most effective compared to all other decompositions we tested in 2015-2016. Using “2 rows, 6 cells each” we managed to find several new pairs, but ifs effectiveness compared to “2 rows, 5 cells each” turned out to be low, that is why the corresponding experiment was suspended. IV. F INDING T RIPLES OF M UTUALLY PARTIALLY O RTHOGONAL D IAGONAL L ATIN S QUARES OF O RDER 10 U SING C OMPUTING C LUSTER Let us consider a triple of Latin squares of the same order. Among all possible sets of ordered pairs of elements, that form when we look at this in the same manner as when we check the orthogonality condition for all three pairs of squares simultaneously, we choose the set with maximal power. Let us refer to the power of this set as characteristics of the considered triple of Latin squares. Currently the record triple of mutually partially orthogonal Latin squares of order 10 in this notation is the one published in [13]. In this triple square A is fully orthogonal to squares B and C, but squares B and C are orthogonal only over 91 pairs of elements out of 100. It means that in our notation the characteristics of this triple is MIPRO 2016/DC VIS 91. We are focused on finding triples of diagonal Latin squares of order 10, that have high value of characteristics. In [11] we suggested to consider the problem of constructing triples of MODLS of order 10 in the form of SAT problem. We proposed a propositional encoding, in which by editing a special clause it was possible to set up the desired value of characteristics. When we add to this CNF the known values of two known Latin squares that form an orthogonal pair, it is reduced to finding values of remaining unknown Boolean variables in such a way that the resulting third diagonal Latin square forms with the considered known pair the partially orthogonal triple with desired value of characteristics. For each known pair (at that moment there were only 20) we constructed a separate CNF by assigning values to Boolean variables corresponding to the elements of known pair. We employed the multi-threaded CDCL solver TREENGELING [14], that was launched on every obtained SAT instance with time limit equal to 1 hour on one computing node of “Academician V.M. Matrosov” computing cluster of ISC SB RAS2 . One node of this cluster contains 32 CPU cores and 64 Gb RAM. Using this approach it was possible to find the triple of proposed kind of squares of order 10 with characteristics equal to 73 (i.e. only for 1 considered SAT instance TREENGELING managed to find a satisfying solution, calculations on the other SAT instances were interrupted). It should be noted, that the triple of MODLS of order 10 from [10] has characteristics 60, but in that triple 2 pairs of squares out of 3 are orthogonal (opposite to 1 out of 3 in the triple from [11]). Since in 2015-2016 we found 29 more new pairs of MODLS of order 10, we constructed 29 more SAT instances in addition to 20 considered in [11]. UsingTREENGELING with the same time limit (1 hour) we found two more partially orthogonal triples with characteristics equal to 73 (see (1) and (2)). 0123456789 1902634875 4276398510 9018573642 3795182064 5647021398 6381245907 7569810423 8450967231 2834709156 0123456789 2438910567 6981074235 5602137894 8516703942 3079865421 7264589310 4850392176 9347621058 1795248603 3825079164 1743862059 6587431290 8204697531 4690583712 7462918305 2976150843 5031246978 9158304627 0319725486 (1) 0123456789 1290378564 4869201357 2971865043 3607589421 9584137206 5732614890 6348720915 8015943672 7456092138 0123456789 9567241803 5390784162 1258037694 8719325046 7941560328 4876102935 2085693471 6432879510 3604918257 9310475826 3245809671 6508347219 5081623794 7963581042 1497062538 8629754103 2754198360 0172936485 4836210957 (2) 2 http://hpc.icc.ru 295 These triples are based on the 5-th and 16-th pairs of MODLS of order 10 found in the experiment respectively (in each triple the first two diagonal Latin squares are from the corresponding pair of MODLS). The propositional encoding proposed in [11] made it possible to obtain one more class of results. In this encoding we can set the constraint that the characteristics of the triple is 100, thus formulating the problem of search for a triple of MODLS of order 10. In this case when we augment the SAT instance with the known values of cells of two squares, the problem transforms to the following: for a given pair of MODLS of order 10 to find the third diagonal Latin square that forms a triple of MODLS of order 10, or to prove that there is no such square. Thus we constructed 49 SAT instances with assignments of values corresponding to 49 known pairs. For each instance there was added the constraint specifying that the characteristics of the triple is equal to 100. On average to solve one such SAT instance the PLINGELING CDCL solver [14] took about one second, using 32 processor cores (one computing node of the Academician Matrosov cluster of ISC SB RAS). In all cases it was proven that there is no square satisfying specified limitations (this is a consequence of the fact, that all CNFs were unsatisfiable). We even managed to significantly improve this result. When we reduced the specified value of characteristics, the SAT instances considered were getting progressively harder. At the present moment we proved for 49 known pairs that they cannot be used to construct sets of three partially orthogonal diagonal Latin squares of order 10 with the characteristic equal to 87. The solving of corresponding SAT instances took PLINGELING SAT solver on average about 8 hours per instance on 1 cluster node. V. R ELATED W ORKS There are several examples of application of high performance computing to the search for combinatorial designs based on Latin squares. In [15] there was proven that there is no finite projective plane of order 10 via special algorithms based on constructions and results from the theory of error correcting codes. The corresponding experiment took several years. On its final stage there was used quite a powerful (at that moment) computing cluster. More recent example is the proof of hypothesis about the minimal number of clues in Sudoku [16] where special algorithms were used to enumerate and check all possible Sudoku variants. To solve this problem a modern computing cluster had been working for almost a year. The volunteer computing project Sudoku@vtaiwan [17] was used to confirm the solution of this problem. In [18] there was described the application of SAT solvers to finding systems of orthogonal Latin squares. Among other things, the author of [18] used a specially constructed small desktop grid for more than 10 years in an attempt to find a triple of MODLS of order 10. Unfortunately, the experiment did not yield any success. Apparently, [19] became the first paper about the use of a desktop grid based on the BOINC platform for solving SAT. 296 It did not evolve into a publicly available volunteer computing project (like SAT@home did). VI. C ONCLUSION In the paper we describe the results obtained by applying high performance computing to SAT-based searching for systems of diagonal Latin squares of order 10. Using SAT@home we found 29 pairs of MODLS of order 10 (in addition to 20 previously known pairs). Using a computing cluster we found two triples of partially orthogonal diagonal Latin squares of order 10 with characteristics equal to 73 (in addition to one such triple found earlier). We also proved that based on all 49 known pairs of MODLS of order 10 it is impossible not only to construct a triple of MODLS of order 10, but even to construct a triple of partially orthogonal diagonal Latin squares of order 10 with characteristics equal to 87. ACKNOWLEDGMENT The research was funded by Russian Science Foundation (project No. 16-11-10046). R EFERENCES [1] C. J. Colbourn and J. H. Dinitz, The CRC handbook of combinatorial designs. CRC Pr I Llc, 1996. [2] A. Biere, M. J. H. Heule, H. van Maaren, and T. Walsh, Eds., Handbook of Satisfiability, ser. Frontiers in Artificial Intelligence and Applications. IOS Press, February 2009, vol. 185. [3] J. Marques-Silva, I. Lynce, and S. Malik, Conflict-Driven Clause Learning SAT Solvers, ser. Frontiers in Artificial Intelligence and Applications. IOS Press, February 2009, vol. 185, ch. 4, pp. 131–153. [4] M. Posypkin, A. Semenov, and O. Zaikin, “Using BOINC desktop grid to solve large scale SAT problems,” Computer Science (AGH), vol. 13, no. 1, pp. 25–34, 2012. [5] D. P. Anderson and G. Fedak, “The computational and storage potential of volunteer computing,” in Sixth IEEE International Symposium on Cluster Computing and the Grid (CCGrid 2006), 16-19 May 2006, Singapore. IEEE Computer Society, 2006, pp. 73–80. [6] I. Foster, Designing and Building Parallel Programs: Concepts and Tools for Parallel Software Engineering. Boston, MA, USA: Addison-Wesley Longman Publishing Co., Inc., 1995. [7] A. Semenov and O. Zaikin, “Using monte carlo method for searching partitionings of hard variants of boolean satisfiability problem,” in Parallel Computing Technologies - 13th International Conference, PaCT 2015, Petrozavodsk, Russia, August 31 - September 4, 2015, Proceedings, ser. Lecture Notes in Computer Science, V. Malyshkin, Ed., vol. 9251. Springer, 2015, pp. 222–230. [8] N. Eén and N. Sörensson, “An extensible sat-solver,” in Theory and Applications of Satisfiability Testing, 6th International Conference, SAT 2003. Santa Margherita Ligure, Italy, May 5-8, 2003 Selected Revised Papers, ser. Lecture Notes in Computer Science, E. Giunchiglia and A. Tacchella, Eds., vol. 2919. Springer, 2003, pp. 502–518. [9] O. Zaikin, A. Semenov, and I. Otpuschennikov, “Solving weakened cryptanalysis problems for the Bivium cipher in the volunteer computing project SAT@home,” in Second International Conference BOINC-based High Performance Computing: Fundamental Research and Development (BOINC:FAST 2015), Petrozavodsk, Russia, September 14-18, 2015, ser. CEUR-WS, vol. 1502, 2015, pp. 22–30. [10] J. Brown, F. Cherry, L. Most, E. Parker, and W. Wallis, “Completion of the spectrum of orthogonal diagonal latin squares,” Lecture notes in pure and applied mathematics, vol. 139, pp. 43–49, 1992. [11] O. Zaikin and S. Kochemazov, “The search for systems of diagonal Latin squares using the SAT@home project,” in Second International Conference BOINC-based High Performance Computing: Fundamental Research and Development (BOINC:FAST 2015), Petrozavodsk, Russia, September 14-18, 2015, ser. CEUR-WS, vol. 1502, 2015, pp. 52–63. [12] I. Lynce and J. Ouaknine, “Sudoku as a SAT problem,” in International Symposium on Artificial Intelligence and Mathematics (ISAIM 2006), Fort Lauderdale, Florida, USA, January 4-6, 2006, 2006. MIPRO 2016/DC VIS [13] J. Egan and I. M. Wanless, “Enumeration of MOLS of small order,” Math. Comput., vol. 85, no. 298, pp. 799–824, 2016. [14] A. Biere, “Lingeling essentials, A tutorial on design and implementation aspects of the the SAT solver lingeling,” in POS-14. Fifth Pragmatics of SAT workshop, a workshop of the SAT 2014 conference, part of FLoC 2014 during the Vienna Summer of Logic, July 13, 2014, Vienna, Austria, ser. EPiC Series, D. L. Berre, Ed., vol. 27. EasyChair, 2014, p. 88. [15] C. Lam, L. Thiel, and S. Swierz, “The nonexistence of finite projective planes of order 10,” Canad. J. Math., vol. 41, pp. 1117–1123, 1989. [16] G. McGuire, B. Tugemann, and G. Civario, “There is no 16-clue sudoku: Solving the sudoku minimum number of clues problem via hitting set enumeration,” Experimental Mathematics, vol. 23, no. 2, pp. 190–217, 2014. [17] H.-H. Lin and I.-C. Wu, “Solving the minimum sudoku problem,” in The 2010 International Conference on Technologies and Applications of Artificial Intelligence, ser. TAAI ’10. Washington, DC, USA: IEEE Computer Society, 2010, pp. 456–461. [18] H. Zhang, Combinatorial Designs by SAT Solvers, ser. Frontiers in Artificial Intelligence and Applications. IOS Press, February 2009, vol. 185, pp. 533–568. [19] M. Black and G. Bard, “SAT over BOINC: an application-independent volunteer grid project,” in 12th IEEE/ACM International Conference on Grid Computing, GRID 2011, Lyon, France, September 21-23, 2011, S. Jha, N. gentschen Felde, R. Buyya, and G. Fedak, Eds. IEEE Computer Society, 2011, pp. 226–227. MIPRO 2016/DC VIS 297 Architectural Models for Deploying and Running Virtual Laboratories in the Cloud E. Afgan1,2, A. Lonie3, J. Taylor1, K. Skala2, N. Goonasekera3,* 1 Johns Hopkins University, Biology department, Baltimore, MD, USA Ruder Boskovic Institute (RBI), Centre for Informatics and Computing, Zagreb, Croatia 3 University of Melbourne, Victorian Life Sciences Computation Initiative, Melbourne, Australia enis.afgan@jhu.edu, alonie@unimelb.edu.au, jxtx@jhu.edu, skala@irb.hr, ngoonasekera@unimelb.edu.au 2 Abstract - Running virtual laboratories as software services in cloud computing environments requires numerous technical challenges to be addressed. Domain scientists using those virtual laboratories desire powerful, effective and simple-to-use systems. To meet those requirements, these systems are deployed as sophisticated services that require a high level of autonomy and resilience. In this paper we describe a number of deployment models based on technical solutions and experiences that enabled our users to deploy and use thousands of virtual laboratory instances. I. INTRODUCTION The past decade has seen cloud computing go from conception to the de facto standard platform for application deployment. Cloud infrastructures are delivering resources for deploying today’s applications that are more scalable [1], more cost-effective [2], more robust [3], more easily managed [4], and more economical [5]. Researchers and research groups are no different from the rest of the industry, expecting robust, powerful cloud platforms capable of handling their data analysis needs. However, deploying such platforms still requires a significant amount of effort and technical expertise. In this paper, we build on our experiences from 5+ years of building and managing virtual laboratories that were deployed thousands of times on clouds around the world. We present viable architectural deployment models and extract best practices for others developing or deploying their own versions of robust research platforms. The theme of this paper revolves around deploying the concept of a virtual laboratory as a platform for performing data analysis [6]. Virtual labs offer access to a gamut of data analysis tools and workflow platforms that are closely linked to commonly used datasets; they offer access to scalable infrastructure that has been appropriately configured, beforehand as well as dynamically at runtime. Once built, the virtual labs are often deployed on demand by the researchers themselves. However, in order to make these platforms available to domain researchers, there is a requirement to build, configure, and provision the necessary components. Depending on the complexity of a virtual lab, this is often a complex task spanning expertise in system administration, platform development, and domain-specific application setup. In addition to deploying variations of said virtual labs on public clouds, institutions are increasingly setting up academic clouds (e.g., NeCTAR, Chameleon, JetStream). Locality of the infrastructure, restrictions on off-shoring data [7], avoiding vendor lock-in and the no-cost or merit* Corresponding author 298 based allocation of resources are attractive reasons for utilizing those clouds. From a platform deployment standpoint, this brings up additional challenges because the platforms need to be deployed, managed, maintained, and supported on these additional clouds while coping with any differences among the cloud providers. It is hence imperative to design scalable, robust and cloud agnostic models for deploying these systems. Figure 1 captures the core concepts enabling development of such models: (a) a cross-cloud API layer; (b) automation; (c) a configurable and ‘composable’ set of resources. These concepts, detailed in the remainder of the paper, embody the notion that for successfully building a global virtual lab a common platform rooted in automation is needed. VL VL VL Resource set Resource set Resource set Automated build and deploy Cloud 1 API Cloud 1 Common API Cloud 2 API Cloud 2 Cloud n API Cloud n Figure 1. Virtual lab deployment stack unified for multiple clouds. II. FUNCTIONAL REQUIREMENTS The choice of a Virtual Lab architecture is driven by a variety of aspects and cross-cutting concerns [8]. While some of these decisions are general architectural decisions applicable to software in general, and some are highly specific to the domain in question, there are some concerns which are applicable to virtual labs in general. In this section, we provide a treatment of such concerns, and list some of the various architectural concerns that must be addressed in designing and developing a virtual lab environment. For example, a virtual lab would need to determine the level of customisation that is required by a user. If a significant customisation is required, it is often the case that it will impact other users, and therefore, isolated or MIPRO 2016/DC VIS individualised access to resources is preferable over access to a common pool of shared resources. For example, a userowned container or virtual machine, as opposed to a predeployed web service. Similarly, small job sizes can typically be catered to by a single, individualised VM, whereas large job sizes may require an architecture that can dynamically scale to accommodate more diverse needs. The choice of appropriate strategy is dependent on several additional factors, including the purpose of the TABLE I. Challenges virtual lab, target cloud(s) capabilities, available people effort commitment and similar. Table 1 supplies a core list of design questions to answer when weighing the available options. Note that there is no single answer to the supplied questions but they are largely dependent on the aims of the virtual lab. Deciding what are the acceptable answers for the particular lab will help guide the myriad of technical choices related to implementation. In the sections that follow, we discuss various compute and data provisioning strategies that can accommodate these decisions. FUNCTIONAL DESIGN QUESTIONS TO CONSIDER WHEN DESIGNING A VIRTUAL LAB. Description Infrastructure Infrastructure maturity Which cloud to use? How stable/mature is the infrastructure? Will the deployed lab be robust for users? Infrastructure agnosticism How easy is it to support multiple infrastructure providers? Is this desirable/necessary to increase accessibility/robustness? Support What type of support does the provider offer, for the virtual lab and individual users? User Management Per-user customisation Can each user customise the virtual lab according to their needs, and have a safe environment in which to learn through failure? Data management How is data put into/taken out of the virtual lab? Quota management What resource quotas should be enforced for the user? Does the infrastructure provider support that? Users’ Management of VL Instance lock-in Is an upgrade path available so that the user can always use the latest version of the virtual lab? Replicability Can the user replicate their experiments - with a guarantee that all software versions remain unchanged? Reliability How reliable should the service be? Can losses be tolerated? Service Management Software management Can a user manage the software on the virtual lab on their own, or do they need system administration skills? Licensing constraints Are there specific licensing constraints that limit the use of the software? Security Authentication Should the virtual lab allow for single-signon with institutional credentials? Credentials How are institutional credentials translated into cloud provider credentials? Authorization What actions are users allowed to perform within a virtual lab? III. VIRTUAL LAB INFRASTRUCTURE COMPONENTS Answering above questions and provisioning a virtual lab requires a marriage of various, complex software components to their required storage and processing MIPRO 2016/DC VIS resource requirements. Depending on the intended usage for the virtual lab, there are a number of choices regarding the use of appropriate cloud resources. Table 2 provides a snapshot of the available approaches for supplying compute capacity along with the pros and cons for each option. 299 TABLE II. Provisioning strategy COMPUTE PROVISIONING STRATEGIES. Description Pros Cons Machine Image A pre-built machine image with all required software already installed. • Quick startup • Excellent reproducibility • Difficult to upgrade due to monolithic nature • Software packages not self-contained causing potential version conflicts • Software potentially tied to OS version • Limitations on size • Breached applications may affect entire machine Container A pre-built container (such as Docker or LXC), which is deployed on top of a running Machine Image or a cloud container service. • Extremely quick startup • Excellent reproducibility • Containers mostly independent of underlying machine’s operating system and version • Easier updates to individual components • Breaches contained to container • Must be pre-built • Still very new so quickly changing technology Runtime Required software installed at runtime, using automation software such as Ansible, Chef, Puppet, etc. • Push or Pull update models • Updates easier to make • Slow deployment/startup times • Less reproducible (transient network errors, software version changes) Hybrid A pre-built machine image/container for quick startup, brought up-to-date through runtime deployment/ extensibility. • Can benefit from the advantage of all of the above models • More complex to implement Compute resources need to be matched with suitable storage capacity. Table 3 differentiates among the currently available cloud storage resource types and captures the pros and cons of each. The supplied information examines ways that data can be brought to the compute infrastructure, since this is still the dominant way to work with existing TABLE III. Storage Model scientific software. We do not consider the reverse model as many virtual labs still struggle to shed the weight of their accumulated legacy software that require a shared, UNIXbased file-system to run, and are not yet in a position to take advantage of such models despite their benefits. DATA PROVISIONING STRATEGIES FOR GETTING REQUIRED STATIC DATA TO THE COMPUTE. Description Pros Cons Volumes / Snapshots A volume or snapshot containing the required data, which is attached to an instance at runtime. • Quick to create/attach • Suitable for large amounts of data • Not shareable between clouds (in OpenStack, not shareable between projects) • Not guaranteed to be available (e.g., infrastructure/quota restrictions) • Limited sharing ability between nodes (e.g., volumes only attachable to 1 instance at a time) Shared POSIX filesystem A shared filesystem containing the required data (e.g. NFS, Gluster), which is mounted on the target node at runtime. • One-time setup • Very fast to attach • Updates visible at runtime to all virtual lab instances • Suitable for very large amounts of data • Must be setup on each supported cloud • Centralised management/single point of failure • Not geographically scalable Remotely fetched data archive An http/ftp link containing the required data, which is downloaded and extracted onto local/transient storage. • Cloud agnostic • Scalable • Slow - takes a long time to fetch and extract • Not suited to very large amounts of data - to reduce download times/costs Object-store Object-storage service provided by the cloud provider (e.g. S3, Swift). • High scalability • Not suitable for random access • Not supported by legacy tools 300 MIPRO 2016/DC VIS IV. have appropriate access to the cloud provider where the image is available and must personally launch an instance of the virtual lab; various launcher applications can make this a straightforward process. Once launched, the user will have full control over the virtual lab services but will also need to manage the services, particularly when upgrades or fixes are necessary. In addition to managing the services, the user is in charge of data management, ensuring that the data is not lost when an instance is terminated. Virtual lab providers need to bundle the virtual lab into an instantiatable image, such as a virtual machine image or a container, and provide periodic upgrades to the image. Examples include CloudBioLinux [10]; DEPLOYMENT OPTIONS The resources available and required to compose a virtual lab can be assembled in a variety of configurations. Configurations support different use cases and require varying levels of technical complexity to deploy. Hence, depending on the intended purpose of the virtual lab, it is important to choose an appropriate deployment model. We define the following deployment models and supply a flowchart in Figure 2 to navigate among the models: • • Centrally managed resource is a virtual lab which is presented as a public service to the community. Typically available as a web portal, this virtual lab requires little or no setup from the user’s side and permits the user to readily utilize resources offered by the virtual lab. Because it is a public resource, the user is likely to experience limited functionality, such as usage quotas, no ssh access, no possibility for customisation and other similar constraints typical of public services. While the users do not require any setup for this type of virtual lab, the lab maintainers need to manage and update underlying infrastructure supporting the supplied services. Resource management needs to account for the scaling of the supplied services, upgrades, and reproducibility of user’s results. In addition to accessibility, other main drivers for choosing this model for lab deployment are data management restrictions (in case data is too large for feasible sharing) and software licensing constraints. Examples of this type of virtual lab include XSEDE science gateways (https://portal.xsede.org/ web/guest/gateways-listing), Characterisation Virtual Lab (https://www.massive.org.au/cvl/), usegalaxy.org portal [9]; Standalone image represents a feature-full version of the virtual lab in a small package. A user is required to Will the virtual lab be used as a shared, non-customisable community service? No Yes Yes Standalone VM/container from an image Are the number of users and workload sizes predictable? Yes Are the anticipated workloads small? • Persistent short-lived scalable cluster is a dynamically scalable version of the virtual lab image with additional services to handle infrastructure scaling. These services (i.e., cluster management services) are used to provision a virtual cluster at runtime (e.g., Slurm, SGE, Hadoop) or utilize cloud provider services for scaling (e.g., container engine). The cluster manager software will also supply additional cluster management services, such as cluster persistence, allowing a user to shutdown the cluster when not in use while ensuring the data is preserved. This deployment model requires coordination of several resource types from Section 3 and use of cluster management software, hence implying a significant deployment effort from the virtual lab deployers. Examples include the Genomics Virtual Lab (GVL) [11]; • Long-lived scalable cluster has the same characteristics of a short-lived cluster as well as the ability to upgrade running services. The upgrades are typically handled by the cluster management software. No Are the data analysis needs periodic? No Long-lived scalable virtual cluster Yes Persistent short-lived scalable virtual cluster No Statically sized centrally managed resource Dynamically scalable centrally managed resource Figure 2. V. Data provisioning strategies for getting required static data to the compute. DISCUSSION In addition to the hardware and functional requirements for establishing a virtual lab, there are other important technical and management decisions that affect its MIPRO 2016/DC VIS deployment. One of the key attractions for using virtual labs is the high-level, software-as-a-service experience delivered to a user. An implication is that the functions offered by the deployed services need to function well, 301 which drives a need for good testing strategies and quality assurance. However, complex services and frequent releases make this a challenge for the deployers. It is hence advisable to automate the testing procedure and develop a quality control process before each release. Ideally, the testing process is decentralized, testing the individual services for proper functionality and focusing the testing of the virtual lab on the configuration setup. Such testing can be achieved by adopting a user-centric view of the virtual lab and using tools such as Selenium to automate the typical user actions [12]. Further, virtual labs should be complemented with a set of training materials, describing all the steps required to access the services supplied by a virtual lab. Additional training materials for using the services are also beneficial, particularly if they are accompanied with webinars or handon workshops. Besides the technical implementation of the virtual lab, arguably the most challenging piece of a virtual lab is longterm support. Domain researchers will rely on the virtual lab to perform data analyses and publish new knowledge based on the obtained results. For reproducibility purposes, it is hence important to maintain their access to the resources required to use a virtual lab. When using a commercial cloud provider and supplying shared resources, note should be taken to questions of what happens when the project funding runs out or software being used becomes obsolete. Such questions imply that the virtual lab should make appropriate provisions to allow all the data and accompanying methods to be downloaded or transferred off the infrastructure initially used. Related to the long term support is the notion of upgradeability. For example, a user working with a particular version of a virtual lab, may wish to upgrade to the latest available version. It is generally undesirable to foist an upgrade on users, as this can adversely affect reproducibility when software versions are changed. Therefore, a controlled migration or exit path is often necessary so that users can switch to newer versions of a virtual lab when appropriate for their circumstances. VI. SUMMARY With the increased proliferation of cloud computing infrastructures, we believe the concept of a virtual lab - a composite platform capable of performing open-ended data analyses - will become a prevalent platform for utilizing cloud resources by researchers. In this paper we’ve described the components required to compose a virtual lab. Technical and managerial aspects of the decision making process have been presented shining light on the tradeoffs among viable options. Looking to the future, it is expected that the concept of a virtual lab will continue to evolve towards a more integrated, quickly deployable system that is instantly accessible by users. Containers, automation solutions, and serverless runtime platforms are likely key technologies that will be adopted to realize this evolution. ACKNOWLEDGMENTS This project was supported in part through grant VLS402 from National eCollaboration Tools and 302 Resources, grant eRIC07 from Australian National Data Service, grant number HG006620 from the National Human Genome Research Institute, and grant number CA184826 from the National Cancer Institute, National Institutes of Health. REFERENCES [1] Q. Zhang, L. Cheng, and R. Boutaba, “Cloud computing: state-of-the-art and research challenges,” J. Internet Serv. Appl., vol. 1, no. 1, pp. 7–18, Apr. 2010. [2] M. Armbrust, A. Fox, R. Griffith, A. Joseph, and RH, “Above the clouds: A Berkeley view of cloud computing,” Univ. California, Berkeley, Tech. Rep. UCB , pp. 07–013, 2009. [3] G. Garrison, S. Kim, and R. L. Wakefield, “Success factors for deploying cloud computing,” Commun. ACM, vol. 55, no. 9, p. 62, Sep. 2012. [4] G. Garrison, R. L. Wakefield, and S. Kim, “The effects of IT capabilities and delivery model on cloud computing success and firm performance for cloud supported processes and operations,” Int. J. Inf. Manage., vol. 35, no. 4, pp. 377–393, Aug. 2015. [5] S. P. Ahuja, S. Mani, and J. Zambrano, “A Survey of the State of Cloud Computing in Healthcare,” Netw. Commun. Technol., vol. 1, no. 2, p. 12, Sep. 2012. [6] S. D. Burd, X. Luo, and A. F. Seazzu, “CloudBased Virtual Computing Laboratories,” in 2013 46th Hawaii International Conference on System Sciences, 2013, pp. 5079–5088. [7] J. J. M. Seddon and W. L. Currie, “Cloud computing and trans-border health data: Unpacking U.S. and EU healthcare regulation and compliance,” Heal. Policy Technol., vol. 2, no. 4, pp. 229–241, Dec. 2013. [8] A. Garcia, T. Batista, A. Rashid, and C. Sant’Anna, “Driving and managing architectural decisions with aspects,” ACM SIGSOFT Softw. Eng. Notes, vol. 31, no. 5, p. 6, Sep. 2006. [9] E. Afgan, J. Goecks, D. Baker, N. Coraor, A. Nekrutenko, and J. Taylor, “Galaxy - a Gateway to Tools in e-Science,” in Guide to e-Science, X. Yang, L. Wang, and W. Jie, Eds. Springer, 2011, pp. 145–177. [10] K. Krampis, T. Booth, B. Chapman, B. Tiwari, M. Bicak, D. Field, and K. Nelson, “Cloud BioLinux: pre-configured and on-demand bioinformatics computing for the genomics community,” BMC Bioinformatics, vol. 13, p. 42, 2012. [11] E. Afgan, C. Sloggett, N. Goonasekera, I. Makunin, D. Benson, M. Crowe, S. Gladman, Y. Kowsar, M. Pheasant, R. Horst, and A. Lonie, “Genomics Virtual Laboratory: A Practical Bioinformatics Workbench for the Cloud.,” PLoS One, vol. 10, no. 10, p. e0140829, Jan. 2015. [12] E. Afgan, D. Benson, and N. Goonasekera, “Testdriven Evaluation of Galaxy Scalability on the Cloud,” in Galaxy Community Conference, 2014. MIPRO 2016/DC VIS A CAD Service for Fusion Physics Codes Marijo Telenta, Leon Kos and EUROFusion MST1 Team∗ University of Ljubljana, Mech. Eng., LECAD, Ljubljana, Slovenia marijo.telenta@lecad.fs.uni-lj.si, leon.kos@lecad.fs.uni-lj.si Abstract—There is an increased need for coupling machine descriptions to various fusion physics codes. We present a computer aided design (CAD) service library that interfaces geometrical data requested by fusion physics codes in completely programmatic way for use in scientific workflow engines. Fusion codes can request CAD geometrical data at different levels of details (LOD) and control major assembly parameters. This service can be part of the scientific workflow that delivers meshing of the CAD model and/or variation of the parameters. In this paper we present re-engineering of the ITER tokamak using an open source CAD kernel providing standalone library of services. Modelling of the machine is done with several LOD, starting from the rough one and building/replacing with more detailed models by adding more details and features. Such CAD modelling of the machine with LODs delivers flexibility and data provenance records for the complete CAD to physics codes workflow chain. I. I NTRODUCTION Several commercial “workbench”-style workflows can integrate CAD data for further analysis and parametric optimisation. Usually, the CAD data for use in analysis is prepared or extracted and simplified in such a way that it is then easily processed further on by meshing tools. Control of the CAD data output relies on proprietary plug-ins that allow extraction to some extent for interaction foreseen by the “integrators”. Depending on the workbench-suite, building a coupled simulation with external codes/solvers is only partially supported in general and does not offer required flexibility to “custom” physics codes used by the scientists. Importance of the “open” approach to data and results is often prerequisite for (re)use in scientific community and that rules out direct coupling to CAD packages within the “scientific” workflow engines such as Kepler [1]. Instead of “extracting” the machine descriptions from CAD models directly, most codes use manually prepared machine data for their input and have a “hard time” to verify its correctness with the CAD model. For each new machine to be introduced into the code, one needs to repeat the procedure of machine description creation by maintaining “internal” input format used by the code. Most of these custom input file formats are relatively simple and meant to be read in line with Fortran and other programming languages. Little attention is given to consistency checking of the input format. Most of the fusion-physics codes use “discretised” geometry for input by losing some “precision” in machine descriptions. Notable exceptions are Monte Carlo codes [2] which do ∗ See http://www.euro-fusionscipub.org/mst1 MIPRO 2016/DC VIS Fig. 1. ITER tokamak complex re-modelled with Open CASCADE kernel that provides diverse programmable geometric services from CAD data to fusion codes at several levels of details (LOD). Incremental LOD add features to assemblies while maintaining the physical correctness are one of the parameters of the CAD service. not use grids for determining the particle position but rather simple primitives (spheres, bounding blocks, etc.) and to some extent higher level primitives bounded by NURBS surfaces commonly used in CAD modelling. Such codes are even faster if the space is not discretised with dense “grids” which can easily increase computational complexity of positional checking. In principle, input grid density controls the computational complexity of the code and a proper balance of the grid and available time is always sought by the scientists to achieve acceptable accuracy. It should be noted that computational grid usually differs from input grid although both should be in “consensus” when introducing important geometrical details that are neglected with “rough” grids. Described rationale means that for scientific modelling several input grids are needed that correspond to the “latest” CAD model available as the only “true” input source from which grids or other geometrical primitives should be generated for use as a code input. CAD modelling process of the tokamak machine in Fig. 1 was divided into five assemblies: blanket, divertor, magnets, cryostat, and vessel. For every assembly different 303 level of details (LOD), leading geometrical dimensions, and other parameters such as number of sections are defined. The resulting CAD model is quite complex, with many parts, details and features. Physics codes need only specific parts of the CAD model prepared in a code-compatible format often called a “machine description”. Including the LOD parameter that provides increased number of parts and geometrical features to analyses of physics codes accuracy and corresponding computational complexity is usually not performed due to tedious input preparation. Manual reduction of features done by removing and simplifying unnecessary details in the CAD model is often called defeaturing. However, in this paper, opposite approach was taken. The CAD model was reverse engineered, i.e., the CAD model is re-modelled from bottom up, starting from simpler shape with basic features and modelling the geometry by adding features in parametric way. In this process different LOD’s are defined where each LOD has pre-determined specific complexity of the model. The complexity of the model is defined by the number of features and repetitive pieces used to create the CAD model. II. L EVEL O F D ETAILS (LOD) CAD model of the tokamak machine is reverse engineered using the Open CASCADE kernel. Open CASCADE [3] is an open source CAD kernel that offers programmatic flexibility in modelling and meshing of CAD models which mesh can be used as an input for fusion codes. The different levels of details are created in the process of programming the geometry with different CAD model complexity respectively. Basic shapes are modelled with little of geometrical details and with different number of segments as shown without the cryostat assembly in Fig. 2. The first LOD, LOD-0, includes the basic shape and features of the model. This LOD could be used for analysis where details are not as important and make the grid unnecessarily more complex. Several LODs for different sections are shown with different number of segments. This means that the physics codes could be supplied with varying LODs and segments in a programmatic way. Furthermore, the shapes may differ between LODs to follow physics requirements for the simulation. For example, in Fig. 2(a) the divertor is modelled in a “fused” way resembling basic shapes without any void space inside; Blanket and vessel in Fig. 2(b, c) are having no ports apertures. If holes are important for simulation then one can increase the LOD for that part only. If there is a specific requirement on the shape or number of holes one can “inherit” the code and add or remove features. Numeration of LOD doesn’t need to follow agreed levels of detail among assemblies if they are properly developed and documented. Users can consider the code that generates the CAD models as a starting point for specific modifications that can later on become a part of the CAD service. When shape of some part needs modifications due to precision of obeying the code requirements on the complexity, the same principles of LOD modifications can be followed. 304 (a) (b) (c) (d) Fig. 2. LOD-0 of the divertor segment (a), half revolution of the blanket (b), quarter revolution of vessel (c), and magnets with 3 segments. The geometry for all parts uses the basic shape without the features that are added or being replaced at a higher LOD. MIPRO 2016/DC VIS Drawback of the programmatic approach is clearly nongraphical representation of the models. For easier modifications and comprehension we are using Python “wrapping” of the PythonOCC [4] CAD kernel. The Python code is interpreted and skips the need of recompilation at source code modifications. CAD kernel performance by using Python is not significantly affected, as all kernel operations are written in C++. Repeating lengthy process of model creation with increasing LOD is even more comfortable with programming in Python than within a CAD package done interactively. Listing 1. Snippet where separate sections of the tokamak are called in programmatic way in Python. In addition, CAD service for cross section is called for divertor. (a) (b) divertor(display, level_of_detail, number_of_bodies, divertor_outer_radius, divertor_base_radius, divertor_outer_dimension, divertor_inner_dimension) magnets(display, level_of_detail, Magnet_Outer_Radius, Magnet_Inner_Radius, Magnet_System_Height, Number_Of_Elements) vessel(display, level_of_detail, NumberOfSlices, VacuumVesselInnerRadious, VacuumVesselOuterRadious, VacuumVesselUpperHeight, VacuumVesselBottomHeight) blanket(display, level_of_detail, revolve_for, height_above_ks, height_below_ks, inner_radius, outer_radius, top_outer_radius, top_inner_radius, shell_width) (c) cryostat(display, level_of_detail, cryostat_base_inner_radius, sections) Section(display, shape, angle) As one can see from the Python example in Listing 1, each assembly that outputs to display has several input parameters, such as LOD, number of segments, and parameters which define the geometry. In this way, desired level of details is set, and the CAD model geometry can be altered by changing the input parameters, for example, vessel’s inner and outer radius can be changed and all other dimensions of the vessel accordingly. Instead of outputting to display, one can choose to output shapes/assemblies and combine them into desired assembly for further operations on geometry. Operations, such as Section() in the last line of Listing 1 are part of geometry services provided by the library. Creation of the each assembly with PyhonOCC undergoes several steps in programming CAD kernel. Python code for each assembly is also self-consistent. It acts as a module that can be imported independently into other Python codes. Modules have additional geometric parameters such as dimensions and positions of the features that are not exposed but available for modifications by the users. The second level of details in Fig. 3 adds details and new shapes that were left out at LOD-0. Decision on what is included at each level is guided by the physics code requirements. As one can see, for the second LOD-1 apertures are modelled in the blanket and the vessel. The finest LOD2 shown in Fig. 4 defines the most detailed CAD geometry where further geometric details are added to the CAD model in reference to the LOD-1, such as fillets, fasteners, etc. MIPRO 2016/DC VIS (d) Fig. 3. LOD-1 of the divertor segment (a), quarter revolution of the blanket (b), quarter revolution of the vessel (c), and one segment of the magnets (d). Divertor is now split into two bodies with better description of the shapes. Apertures for diagnostic ports, antennas and piping are added to the blanket and the vessel. Toroidal magnets are more precisely described and cooling piping is added. 305 III. A CAD SERVICE (a) (b) The aim of the presented CAD service is to deliver a common interface for transfer of CAD data into machine descriptions for use by physics codes of several facilities (ITER, AUG, JET, MAST-U, TCV and W7-X). The ultimate goal is to control, in programmatic way, the disfeaturing process of the CAD model to the LOD. The product of the disfeaturing process could be than used as an input for different meshing codes and eventually meshed correctly. CAD SERVICE LIBRARY level of detail (LOD) attributes (material) parameters assembly sections OUTPUT RESULTS Meshing Closed surfaces PHYSICS CODES Cross-section ... VISUALISATION (c) (d) Fig. 4. LOD-2 of the divertor segment (a), half revolution of the blanket (b), quarter revolution of the vessel (c), and 3 segments of the magnets (d). Blankets that were previously modelled as a single shape with apertures are now separated and have gaps in-between. Fillets are added to vessel. Divertor was further detailed with holes. Magnets received details in piping and edge features. With the finest LOD-2 one can study influence of the tiny details to the simulations. The choice of what is included is still left to the user with the modification of the code. 306 Fig. 5. CAD service library coupling with physics codes that can control generated input. A CAD service, when used within a scientific workflow, as shown in Fig. 5, delivers an output in a format suitable for reading by general meshing tools. Meshing operation is treated as a black box that outputs the grid which is then used an input for physics codes. The output mesh, as well as, the results of the physics codes, which are calculated using these grids are stored together in a scientific database format. The mesh format from the meshing tools is favoured to be stored in a common fusion modelling grid description. Ideally, it should be possible all results to be stored in general grid description (GGD) format, as this enhances further processing and analysis of the results using visualisation tools that are available within the EUROFusion Integrated Modelling community. IGES and STEP standards are commonly used for the CAD data exchange and can serve as a common data format for input to meshing codes. Otherwise, if a “mesher” is unable to read the standard CAD format then this usually means that custom input needs to be prepared by the CAD service. For example, physics code PFCFLUX [5] calculates the heat flux on the plasma facing components (PFC). Plasma-facing components are a generic term for any part of the tokamak machine where large amount of power deposition is found. The PFCFLUX needs as an input triangular mesh of the 3D PFC surfaces. The usual way in producing required mesh is to manually separate the PFC surfaces from the CAD geometry, fill any holes and gaps not needed on those surfaces, assign different materials to different surfaces, and heal any surfaces if needed. The CAD service library can supply the PFCFLUX code with needed input data in programmatic way which could save time and effort in preparing the mesh. Since PFCFLUX needs only surface triangles, Open CASCADE kernel can be used to build the triangular mesh [6]. Similar approach was MIPRO 2016/DC VIS (a) (b) (c) Fig. 6. CAD service for cross-section of divertor with (a) at LOD-0, (b) at LOD-1, and (c) at LOD-2. taken in SMARDDA software library which offers possibility to perform design-relevant calculations for PFCs [7]. The CAD service is missioned as a library or a module which would be then installed and used as a service for specific physics code. Different CAD service library/module would be created for different tokamak machines. It can provide closed surfaces, needed for analysis, for example, in plasma facing components. In addition, the service could offer, as mentioned above, different LODs which would satisfy the specific needs of the physics code considered. It can give specific attributes of the model such as material. Also, basic mesh can be provided or different cross sections of the model. Finally, depending on the need, different assembly sections and parameters can be provided to the mesher and/or physics codes. If provided data to the physics codes are not sufficient or not valid, CAD service library can re-provide requested data in appropriate form. One can see from Fig. 6 the product of the CAD service for cross section of the divertor with different levels of details. This service is used by the physics codes which use 2D geometry for calculations such as SOLPS-ITER [8]. IV. C ONCLUSION Programming CAD kernel Open CASCADE is used to model CAD geometry of a tokamak machine from simple shape to more complex one. In the process of creating the geometry, different levels of details are defined and programmed. Product is a CAD service library/module which can by request serve physics codes with appropriate data. Variety of the services could be offered depending on the physics code used, such as closed surfaces, cross sections, basic mesh, and material. Numerous physics codes could potentially have benefit using this CAD service. The time spent on preparing the CAD model for meshing will be significantly shorter when using this library. Also, provenance is recorded such that others could repeat the procedure to produce the particular mesh in question for the physics code. Reverse engineering approach for CAD geometry modelling presents opposite approach of CAD model disfeaturing. “Manual” preparation of the CAD models is tedious task that can take several months. With this service many of the preparation steps for simplification are avoided and can be fine-grained and uniformly controlled. Usual simplification steps include geometric simplification, MIPRO 2016/DC VIS suppression/hide of irrelevant details, and decomposition in LODs of complex parts. Future work will include additional tokamaks, services and actors for Kepler. ACKNOWLEDGEMENT This work has been carried out within the framework of the EUROfusion Consortium and has received funding from the Euratom research and training programme 2014-2018 under grant agreement No. 633053, task agreement AWP15-EEGJSI/Telenta. The views and opinions expressed herein do not necessarily reflect those of the European Commission. The authors would like to acknowledge R. Pitts, S. Pinches and X. Bonnin from ITER for helpful discussions. Thanks goes to many students from the Faculty of Mechanical Engineering at the University of Ljubljana that have contributed to this work. R EFERENCES [1] I. Altintas, C. Berkley, E. Jaeger, M. Jones, B. Ludascher, and S. Mock, “Kepler: An extensible system for design and execution of scientific workflows,” in SSDBM ’04: Proceedings of the 16th International Conference on Scientific and Statistical Database Management. Washington, DC, USA: IEEE Computer Society, 2004, p. 423. [2] L. Lu, U. Fischer, Y. Qiu, and P. Pereslavtsev, “The CAD to MC geometry conversion tool McCad: Recent advancements and applications,” in ANS MC2015 - Joint International Conference on Mathematics and Computation (M&C), Supercomputing in Nuclear Applications (SNA) and the Monte Carlo (MC) Method, 2015. [3] “OpenCASCADE – Open CASCADE technology,” http://opencascade. org, 2016. [4] “PythonOCC website – Open CASCADE for Python,” http://pythonocc. org, 2016. [5] M. Firdaouss, V. Riccardo, V. Martin, G. Arnoux, C. Reux, and J.-E. Contributors, “Modelling of power deposition on the jet iter like wall using the code pfcflux,” Journal of Nuclear Materials, vol. 438, Supplement, pp. S536 – S539, 2013, proceedings of the 20th International Conference on Plasma-Surface Interactions in Controlled Fusion Devices. [Online]. Available: http://www.sciencedirect. com/science/article/pii/S0022311513001190 [6] K. L. Telenta, M, R. Akers, and E.-I. Team, “Interfacing of CAD models to a common fusion modelling grid description,” in Proceedings of the International Conference Nuclear Energy for New Europe, Portorož, Slovenia, 2015, pp. 707.1 – 707.8. [7] W. Arter, V. Riccardo, and G. Fishpool, “A cad-based tool for calculating power deposition on tokamak plasma-facing components,” Plasma Science, IEEE Transactions on, vol. 42, no. 7, pp. 1932 – 1942, sept. 2014. [8] S. Wiesen, D. Reiter, V. Kotov, M. Baelmans, W. Dekeyser, A. Kukushkin, S. Lisgo, R. Pitts, V. Rozhansky, G. Saibene, I. Veselova, and S. Voskoboynikov, “The new solps-iter code package,” Journal of Nuclear Materials, vol. 463, pp. 480 – 484, 2015. [Online]. Available: http://www.sciencedirect.com/science/article/pii/S0022311514006965 307 Correlation between attenuation of 20 GHz satellite communication link and Liquid Water Content in the atmosphere Maks Kolman* and Gregor Kosec** *student at Jožef Stefan Insitute, Department of Communication Systems, Ljubljana, Slovenia **Jožef Stefan Insitute, Department of Communication Systems, Ljubljana, Slovenia maks.kolman@student.fmf.uni-lj.si, gregor.kosec@ijs.si Abstract – The effect of Liquid Water Content, i.e. the mass of the water per volume unit of the atmosphere, on the attenuation of a 20 GHz communication link between a ground antenna and communication satellite is tackled in this paper. The wavelength of 20 GHz electromagnetic radiation is comparable to the droplet size, consequently the scattering plays an important role in the attenuation. To better understand this phenomenon a correlation between measured LWC and attenuation is analysed. The LWC is usually estimated from the pluviograph rain rate measurements that captures only spatially localized and ground level information about the LWC. In this paper the LWC is extracted also from the reflectivity measurements provided by a 5.6 GHz weather radar situated in Lisca, Slovenia. The radar measures reflectivity in 3D and therefore a precise spatial dependency of LWC along the communication link is considered. The attenuation is measured with an in-house receiver Ljubljana Station SatProSi 1 that communicates with a geostationary communication satellite ASTRA 3B on the 20 GHz band. I. I NTRODUCTION The increasing demands for higher communication capabilities between terrestrial and/or earth-satellite repeaters requires employment of frequency bands above 10 GHz [1]. Moving to such frequencies the wavelength of electromagnetic radiation (EMR) becomes comparable to the size of water droplets in the atmosphere. Consequently, EMR attenuation due to the scattering on the droplets becomes significant and ultimately dominant factor in the communications quality [2]. During its propagation, the EMR waves encounter different water structures, where it can be absorbed or scattered, causing attenuation [1]. In general, water in all three states is present in the atmosphere, i.e. liquid in form of rain, clouds and fog, solid in form of snow and ice crystals, and water vapour, which makes the air humid [3]. Regardless the state, it causes considerable attenuation that has to be considered in designing of the communication strategy [2]. Therefore, in order to effectively introduce high frequency communications into the operative regimes, an adequate knowledge about atmospheric effects on the attenuation has to be elaborated. In this paper we deal with the attenuation due to the scattering of EMR on a myriad of droplets in the atmosphere that is characterised by LWC and drop size distribution (DSD). A discussion on the physical background of the DSD can be found in [4], where authors describe basic 308 mechanisms behind distribution of droplets. Despite the efforts to understand the complex interplay between droplets, ultimately the empirical relations [5] are used. The LWC and DSD can be related to the only involved quantity that we can reliable measure, the rain rate [6]. Recently it has been demonstrated that for high rain rates also the site location plays a role in the DSD due to the local climate conditions [7]. In general, raindrops can be considered as dielectric blobs of water that polarize in the presence of an electric field. When introduced to an oscillating electric field, such as electromagnetic waves, a droplet of water acts as an antenna and re-radiates the received energy in arbitrary direction causing a net loss of energy flux towards the receiver. Some part of energy can also be absorbed by the raindrop, which results in heating. Absorption is the main cause of energy loss when dealing with raindrops large compared to the wavelength, whereas scattering is predominant with raindrops smaller than the wavelength [2]. The very first model for atmospheric scattering was introduced by lord Rayleigh [8]. The Rayleigh assumed the constant spatial polarization within the droplet. Such simplifications limits the validity of the model to only relatively small droplets in comparison to the wavelength of the Incident field, i.e. approximately up to 5 GHz when EMR scattering on the rain droplets is considered. A more general model was developed by Mie in 1908 [2], where a spatial dependent polarization is considered within the droplet, extending the validity of the model to higher droplet size/EMR wavelength ratios. Later, a popular empirical model was presented in [9], where attenuation is related only to the rain rate. The model, also referred in as a Marshal-Palmer model, is widely used in evaluation of LWC from reflectivity measured by weather radars [10]. Marhsall-Palmer model simply states the relation between the attenuation and rain rate in terms of a power function. In this paper we seek for correlation between the LWC and attenuation measurements. LWC is extracted from reflectivity measurements provided by a weather radar situated in Lisca and operated by Slovenian Environment Agency [11]. Attenuation is measured by inhouse hardware that monitors the signal strength between Ljubljana Station SatProSi 1 and communication satellite ASTRA 3B [12, 13, 14, 15]. The main purpose of this paper is therefore to investigate correlation between precipitation measured in 3D with the meteorological radar and the measured attenuation. MIPRO 2016/DC VIS III. M EASUREMENTS II. G OVERNING MODELS Before we proceed to measurements some basic rela- A. Measurements of signal attenuation tions are discussed. Jožef Stefan Institute (JSI) and European Space Agency Attenuation (A) is a quantity measured in [dB] that de- (ESA) cooperate in SatProSi-Alpha project that includes scribes the loss of electromagnetic radiation propagating measuring attenuation of the communication link between through a medium. It is defined with starting intensity Is on ground antennas and a satellite, more precisely between the ASTRA 3B satellite and SatProSi 1 station. The ASand the intensity received after propagation Ir as TRA 3B is a geostationary communication satellite located on the 23.5◦ E longitude over the equator. It broadcasts the Is A = 10 log10 . (1) signal at 20 GHz, which is received at SatProSi 1 with an inIr house receiver, namely 1.2 m parabolic antenna positioned The specific attenuation (α = A/L) measured in [dB/km] on the top of the JSI main building with a gain of about as a function of rain rate (R) measured in [mm/h] is com- 47 dB. The SatProSi measures attenuation every 0.15 seconds, resulting in over 500000 daily records, since 1. 10. monly modelled as [5] 2011. α(R) ∼ a Rb . (2) Coefficients a and b are determined empirically by fitting the model to the experimental data. In general, coefficients depend on the incident wave frequency and polarization, and ambient temperature. Some example values for different frequencies are presented in Table 1. Drop size distribution (DSD) is a quantity that, unsurprisingly, describes the distribution of droplet sizes. The simplest characterization of rain is through rain rate R, measured in [mm/h]. However, rain rate do not give any information about the type of rain. For example a storm and a shower might have the same rain rate, but different drop size distribution. However, a simple DSD model is presented in [9] N (D) = U exp(−V Rδ D), (3) where D stands for drop diameter measured in [mm], N (D) describes number of droplets of size D to D + dD in a unit of volume measured in [mm−1 m−3 ] and R rain rate measured in [mm/h]. The values of equation parameters were set to U = 8.3 · 103 , V = 4.1 and δ = −0.21. The DSD was also determined experimentally for different rain rates [5]. The experimental data is presented in Figure 1, where we can see that the typical diameter of drops is in range of mm. There is a discrepancy between the theoretical and experimental data with very small droplets. This can be fixed with a modified DSD, however scattering on such small droplets is negligible so the difference is not relevant. B. Measurements of rainfall rate Two sources of rain measurements are used in this paper. The first one is a pluviograph installed locally in the proximity of the antenna. The rain rate is measured every five minutes. Another, much more sophisticated, measurements of rain characteristics are provided by meteorological radars. The basic idea behind such radars is to measure EMR that reflects from water droplets. The measured reflectivity is then related with rain rate with Marhsall-Palmer relation. Radar reflectivity factor Z is formally defined as the sum of sixth powers of drop diameters over all droplets per unit of volume, which can be converted into an integral Z∞ Z= N (D)D6 dD . (4) 0 Note that the form of relation follows the Rayleigh scattering model [16]. Z is usually measured in units mm6 m−3 . When conducting measurements a so-called Equivalent Reflectivity Factor ηλ4 Ze = (5) 0.93π 5 is used, where η means reflectivity, λ is radar wavelength and 0.93 stands for dielectric factor of water. As the name suggests both are equivalent for large wavelengths compared to the drop sizes, as in Rayleigh model [16]. Reflectivity factor and rainfall rate are related through Marshall-Palmer relation as b̃ Z[mm6 m−3 ] = ãR[mm/h] , (6) where Z[mm6 m−3 ] is reflectivity factor measured in mm6 m−3 and R[mm/h] is rainfall rate measured in mm/h. In general, empirical coefficients ã and b̃ vary with location and/or season, however, are independent of rainfall R. Most widely used values are ã = 200 and b̃ = 1.6 [9, 10]. Meteorologists rather use dimensionless logarithmic scale and define Figure 1. DSD measured in Czech Republic (one year measurement, rain rate R is the parameter of particular sets of points) [5]. Lines represent the theoretical value as determined by (3). MIPRO 2016/DC VIS dBZ = 10 log10 Z = 10 log10 Z[mm6 m−3 ] , Z0 (7) where Z0 is reflectivity factor equivalent to one droplet of diameter 1 mm per cubic meter. 309 f[GHz] a b 10 0.0094 1.273 12 0.0177 1.211 15 0.0350 1.143 20 0.0722 1.083 25 0.1191 1.044 30 0.1789 1.007 TABLE 1. VALUE OF COEFFICIENTS FOR M ARSHAL -PALMER RELATION α(R) AT DIFFERENT FREQUENCIES . [5] The meteorological radars at Lisca emit short (1 µs) tion between quantities is clearly seen on the figure but a electromagnetic pulses with the frequency of 5.62 GHz and closer inspection is needed to reveal more details about the measure strength of the reflection from different points in correlation. their path. Radar collect roughly 650000 spatial data points per radar per every atmosphere scan, which they do every 10 minutes. They determine the exact location of all their measurements through their direction and the time it takes for the signal to reflect back to the radar. In addition to reflectivity radars also measure the radial velocity of the reflecting particles by measuring the Doppler shift of the received EMR, but this is a feature we will not be using. IV. DATA ANALYSIS The analysis begins with handling approximately 20 GB of radar data for the academic year 2014/15 accompanied with 3 GB of signal attenuation data for the same time period and approximately 5 GB of attenuation and local rain gauge data for years 2012 and 2013. A. Preprocessing the radar spatial data Radar data was firstly reduced by eliminating spatial points far away from our point of interest, namely the JSI main building where antenna is located. The geostationary orbit is 35 786 km above the sea level, therefore the link between the antenna and the satellite has a steep elevation angle 36.3◦ . In fact just 20 km south of the antenna the ray rises above 15 km, which is the the upper boundary for all weather activities [3]. Knowing this, a smaller area of the map can be safely cropped out, reducing the number of data points from around 650 000 to approximately 6500 for each radar scan covering an 40 km × 40 km area. Although we already gravely reduced original data size, we must still reduce thousands of points into something tangible. The positions of both the antenna and the satellite are known at all times, a lovely consequence of them being stationary, therefore the link between them can be easily traced. Roughly 150 points on the ray path are used as a discrete representation of the link, referred to as link points in future discussions. For each link point a median of n closest radar measurements is computed as a representative value. The other way of extracting reflectivity factor was simply to take closest n points to the antenna and select the median value of those. A visualisation of both methods is presented in Figure 2. Figure 2. Positions of radar measurements. The blue rectangle is the location of the antenna and the rain gauge. The 64 points closest to the antenna are enclosed in a red sphere and marked as red circles. Red dots mark the remainder of 512 closest points. The green line is the ray path between antenna and satellite with green circles representing corresponding support nodes for support size n = 4. Figure 3. Measured antenna attenuation and rain rate extracted from 64 radar measurements closest to the antenna. Both datasets have been sorted into 30 minute bins. B. Correlation between rain and attenuation In order to find a relation between rain rate and electromagnetic attenuation, measurements of both quantities must be paired. There is no obvious way of doing this since both are measured at a vastly different time-scale. We ended up dividing time into bins of duration t0 and pairing the measurements that fall within the same bin. The maximum values of every quantity was selected as a representative for the Now we are left with multiple scalar quantities as a given time period. function of time. Antenna attenuation for every 0.15 s, local rain gauge for every 5 min and various extractions of reflecThe correlation coefficient between two variables X and tivity factor for every 10 min. Note, that radar values are Y can be calculated using not averaged over 10 minutes, radar simply needs 10 minutes to complete a single scan. In Figure 3 an example of corr(X, Y ) = mean((X − mean(X)) · (Y − mean(Y ))) rainfall rate measured with weather radar and the measured std(X)std(Y ) attenuation for a three day period is presented. A correla(8) 310 MIPRO 2016/DC VIS and is a good quantity for determining linear dependence between X and Y . According to the Marshall-Palmer power law a linear relation exists between logarithms of rain rate and specific attenuation. Our measurements are of total attenuation A and not of specific attenuation so we must adjust the equation. We assume a typical distance L as a connecting factor between the, which gives us log10 A = log10 La + b log10 R. (9) The exact value of L is not relevant as only the parameter b will interest us. Therefore a slope on a log-log graph, such as on Figure 4, is equal to the model parameter b. We used a least square linear fit on each set of data to get the corresponding values for b. Figure 5. Correlation between rain rate and attenuation with respect to the number of local support size n and time bin size t0 . Similarly a correlation with respect to the number of integral support nodes and time bin size is presented in FigIn addition, correlation between logarithmic values of ure 6 . Again, the best correlation is obtained with 8h time rain rate and attenuation bins, however with the integral model a small integral support, i.e. n = 22 , already suffices to obtain fair correlation.  corr log10 A[dB] , log10 R[mm/h] (10) Such a behaviour is expected. In an integral mode we follow the ray and support is moving along, therefore there is no need to capture vast regions for each link point. On is used as a quality measure of their relation. the other hand in a local approach only one support is used and therefore that support has to be much bigger to capture enough details about the rain conditions. V. R ESULTS AND DISCUSSION Once we paired attenuation and rainfall data we can scatter the points on a graph. In Figure 4 the attenuation against rain rate at 8 h bin size is presented. For the local radar representation n = 26 and for integral representation n = 22 support size is used. The correlation can be clearly seen, however not unified, as one would expect if measurements and rain rate - reflectivity model would be perfect. Since we introduced two free parameters, namely time bin t0 and spatial support size n for integral and n for local radar representation, a sensitivity analyses regarding those parameters is needed. Figure 6. Correlation between rain rate and attenuation with respect to the number of integral support size n and time bin size t0 . To compare measurements acquired with radar and the ones acquired with the local rain gauge a simpler presentation of correlation is shown on Figure 7. One set of data has rain rate extracted from radar using the integral method with support size 4 and two sets using closest either 64 or 512 nodes. Figure 4. Attenuation dependency on the rain rate measured in three different ways. Local rain gauge (blue), path integration on each step selecting closest 4 points (green) and from 64 points closest to the antenna (red). All measurements have been put into 8h bins. In Figure 5 a correlation with respect to the number of local support nodes and time bin size is presented. The best correlation is obtained with 8h time bins and a local n = 26 support size. MIPRO 2016/DC VIS Figure 7. Correlation between rain rate and attenuation as a function of time bin size t0 for different ways of extracting the rain rate. 311 In the next step we compare our measurements with a Marshall-Palmer model, specifically the exponent b. According to [9] in 20 GHz the b0 = 1.083 should hold. In Figure 8 differences between our measurements and b0 with respect to the time bin size are presented for the same sets of data as were used in the correlation analysis of Figure 7. An order of magnitude improvement is visible between local rain gauge and data extracted with radar. expected, despite the fact that the correlation with the measured attenuation is the highest with the rain gauge measurements. The localized information from the rain gauge simply cannot provide enough information to fully characterize the rain conditions along the link. There are still some open questions to resolve, e.g. what is the reason behind the 8 h time bin giving the best result, how could we improve the correlation, etc. All these topics will be addressed in future work. ACKNOWLEDGMENT The authors acknowledge the financial support from the state budget by the Slovenian Research Agency under Grant P2-0095. Attenuation data was collected in the framework of the ESA-PECS project SatProSi-Alpha. Slovenian Environment Agency provided us with the data collected by their weather radars. R EFERENCES Figure 8. Exponent in attenuation to rainfall relation b compared to value b0 from Table 1 for 20 GHz as a function of bin duration t0 for a few ways of extracting rainfall. VI. C ONCLUSION [1] J. Goldhirsh and F.L. Robison. Attenuation and space diversity statistics calculated from radar reflectivity data of rain. Antennas and Propagation, IEEE Transactions on, 23(2):221–227, 1975. [2] M. Tamošiunaite, S. Tamošiunas, M. Žilinskas, and M. Tamošiuniene. Atmospheric attenuation due to humidity. Electromagnetic Waves, pages 157–172, June 2011. Osnove MeteoThis paper deals with the correlation between attenua- [3] J. Rakovec and T. Vrhovec. rologije. Društvo Matematikov Fizikov in Astion of EMR signal due to the scattering on the ASTRA 3B tronomov Slovenije, 2000. - SatProSi 1 link and measured rain rate. The main objective of the paper is to analyze the related measurements and comparison of results with Marshall-Palmer model. The at- [4] E. Villermaux and B. Bossa. Single-drop fragmentation determines size distribution of raindrops. Nature tenuation is measured directly with an in-house equipment Physics, 5(9):697–702, September 2009. with a relatively high time resolution (0.15 s). The rain characteristics are measured with a rain gauge positioned [5] O. Fiser. The role of dsd and radio wave scattering in next to the antenna and a national meteorological radar. The rain attenuation. Geoscience and Remote Sensing New rain gauge measures average rain rate every five minutes at Achievements, 2010. a single position, while the radar provides a full 3D scan of reflectivity every 10 minutes. Although the attenuation [6] P. Pytlak, P. Musilek, E. Lozowski, and J. Toth. Moddepends mainly on the DSD, a rain rate is used as a referelling precipitation cooling of overhead conductors. ence quantity, since it is much more descriptive, as well as Electric Power Systems Research, 81(12):2147–2154, easier to measure. The reflectivity measured with the radar 2011. is therefore transformed to the rain rate with the MarshallPalmer relation. More direct approach would be to relate [7] S. Das, A. Maitra, and A.K. Shukla. Rain attenuation modeling in the 10-100 GHz frequency using the attenuation with the measured reflectivity directly, howdrop size distributions for different climatic zones in ever that would not change any of the conclusions, since, on tropical India. Progress In Electromagnetics Research a logarithmic scale, a simple power relation between reflecB, 25:211–224, 2010. tivity and rain rate reflects only as a linear transformation. The analysis of support size and time bin size showed quite Blue sky and rayleigh strong dependency on the correlation. It is demonstrated [8] C.R. Nave. scattering. http://hyperphysics.phy6 2 that time bin 8h and support sizes of n = 2 and n = 2 for astr.gsu.edu/hbase/atmos/blusky.html. Accessed: local and integral approach, respectively, provide a decent 2015-12-5. correlation (0.6 − 0.7) between logarithms of measured attenuation and rain rate. Furthermore, the power model has [9] J.S. Marshall and W.McK. Palmer. The distribution been fitted over measured data and the value of the expoof raindrops with size. Journal of meteorology, pages nent has been compared to the values reported in the liter165,166, August 1948. ature. The model shows best agreement with the MarshallPalmer model, when the rain rate is gathered from the inte- [10] R. Uijlenhoet. Raindrop size distributions and radar reflectivity–rain raterelationships for radar hydrology. gral along the communication link. somewhat worse agreeHydrology and Earth System Sciences, pages 615– ment is achieved with a local determination of rain rate. Re627, August 2001. sults obtained with the rain gauge are the furthest from the 312 MIPRO 2016/DC VIS [11] Slovenian Environment Agency. (Agencija Republike Slovenije za Okolje). diversity experiment performed in slovenia and austria. In The 9th European Conference on Antennas and Propagation, Montreal, Lisbon, Portugal, on 1217 APRIL 2015, EuCAP 2015, page 5, 2015. [12] A. Vilhar, G. Kandus, A. Kelmendi, U. Kuhar, A. Hrovat, and M. Schonhuber. Three-site Ka-band diversity experiment performed in Slovenia and Austria. In Antennas and Propagation (EuCAP), 2015 9th [15] C. Kourogiorgas, A. Kelmendi, A. Panagopoulos, S.N. Livieratos, A. Vilhar, and G.E. Chatzarakis. Rain European Conference on, pages 1–5. IEEE, 2015. attenuation time series synthesizer based on copula [13] U. Kuhar, A. Hrovat, G. Kandus, and A. Vilhar. Stafunctions. In The 9th European Conference on Antistical analysis of 19.7 ghz satellite beacon measuretennas and Propagation, Montreal, Lisbon, Portugal, ments in ljubljana, slovenia. In The 8th European on 12-17 APRIL 2015, EuCAP 2015, page 4, 2015. Conference on Antennas and Propagation, to be held at the World Forum in The Hague, The Netherlands, on 6-11 APRIL 2014, EuCAP 2014, pages 944–948, [16] Lord F.R.S. Rayleigh. XXXIV. On the transmission of light through an atmosphere containing small parti2014. cles in suspension, and on the origin of the blue of the sky. Philosophical Magazine Series 5, 47(287):375– [14] A. Vilhar, G. Kandus, A. Kelmendi, U. Kuhar, 384, 1899. A. Hrovat, and M. Schönhuber. Three-site ka-band MIPRO 2016/DC VIS 313 Practical Implementation of Private Cloud with traffic optimization D. G. Grozev, M. P. Shopov, N. R. Kakanakov Technical University of Sofia / Department of Computer systems and Technologies, Plovdiv, Bulgaria dgrozev@vmware.com, mshopov@tu-plovdiv.bg, kakanak@tu-plovdiv.bg Abstract - This paper presents a practical implementation of a private cloud, based on VMware technology, optimized to support CoS and QoS (even when overlay technology like VXLAN is used) in a field of smart metering in electrical power systems and IoT. The use of cloud computing technologies increase reliability and availability of the system. All routing, firewall rules and NATs are configured using NSX. Implementation of CoS and QoS in virtual and physical network will guarantee necessary bandwidth for normal operation among other virtualized services. I. platforms and should be built very scalable. Nevertheless, exporting them in online processing clusters could reduce the privacy and increase the cost of communication. The best alternative in such cases is building a private cloud. Nowadays most of companies have their own cloud platform and virtual infrastructure, where they run business critical application and store data. Advantages of the private cloud includes: • It’s reliable and scalable. All resources are virtualized and in case of demand we can add more storage, or computing without downtime or impact. • Fast provisioning. Using techniques like templates we can deploy thousands machines with few clicks. • Automation. Pretty much everything can be automated using specific command-line tools or REST API. • Common user interface: Decoupling the computation infrastructure and the input system, enables multiple user interfaces to exist side by side allowing user-centric customization. INTRODUCTION Cloud services and Internet of things (IoT) are among the most discussed terms in the latest research topics. They represent the two extremes of computer science – from small distributed devices to large computational infrastructure. The both topics are used together as IoT is one of the important sources of data that should be processed in large volumes [1, 2, 3]. But the use of cloud technologies in IoT are not limited to data processing and storage. The infrastructure can be used to deploy virtual devices and ease their management and dissemination of configurations. Deployment of virtual devices in the cloud will lead to new services: sensing-as-a-service or even sensor-as-a-service [4, 5]. A sustainable energy future depends on an efficient, reliable and intelligent electricity distribution and transmission system, i.e., power grid. The smart grid has been defined as an automated electric power system that monitors and controls grid activities, ensuring the twoway flow of electricity and information between power plants and consumers – and all points in between [6]. The smart power grid is a native application of IoT and Cloud technologies as the intelligent meters are highly distributed and they generate huge amount of measurement data. Texas instruments propose a smart grid architecture with intelligent meters and data concentrators. A data concentrator is the core of the energy management framework. It provides the technology to measure and collect energy usage data. The concentrator can also be programmed to analyse and communicate this information to the central utility database. Not only can the utility providers could use this information for billing services, but can improve customer relationships through enhanced consumer services such as real-time energy analysis and communication of usage information. Additional benefits of fault detection and initial diagnosis can also be achieved, further optimizing the operational cost [7]. All data processing and manipulation services as data concentrators could benefit of the elasticity of the cloud In the paper, an architecture for a private cloud for energy measurements and data processing for smart power grid is presented. This architecture is consisting of a private cloud that hosts data storage, management and presentation services together with network of virtualized sensors. The paper discusses techniques to build and/or tune private cloud to adopt device management software along with other virtualized services. II. BACKGROUND A. Traffic optimization using NSX distributed logical router There are two types of traffic in the cloud: NorthSouth and East-West. North-South traffic is usually ingress/egress. This includes traffic from/to Internet and Intranet. Also it includes traffic between VMs from different port groups and networks that need to be routed in layer 3 physical devices. East-West traffic is VM or Management traffic that bounce between hypervisors (ESXi) [8]. With Distributed Logical Routing (DLR) deployment we move routing functionality to the hypervisor (kernel level) and effectively remove sub-optimal traffic path. Each ESXi host can route between subnets at line rate or nearly line rate speed. This means that if we have two The presented work is supported by the National Science Fund of Bulgaria project under contract Е02/12. 314 MIPRO 2016/DC VIS VMs from different networks, but on same host and use DLR communication between them won’t leave the host. The idea behind DLR is not new one, but it is unique in virtualization world. All data centre switches have separation between control and data plane. This allows network engineers to restart or update control plane, while data plane is working and there is no down time. DLR has two components. The first one is DLR Control VM that is a virtual machine and second one is the DLR kernel module that runs in all ESXi hypervisor. DLR kernel module is called “route-instance” and has same copy of information in each ESXi host [10]. DLR has three types of interfaces: Uplink, Logical interface (LIF) and Management. Uplink is used by DLR Control VM to connect to upstream routers and it’s called “transit” interface between physical and logical space. DLR support OSPF and BGP on its Uplinks, but you cannot run them both at same time. LIFs are interfaces that each ESXi host has in its kernel. They are layer 3 and acts as default gateway for all VM traffic connected to logical switches. Management interface is second interface on DLR control VM and again the idea comes from data centre switches, where management interface is separate physical port. This allow all management traffic to be separated in out of band (OOB) network. Services that should run on management interface are: SSH, Syslog, SNMP. Propagation of routing information in DLR is a complex one. Initially all routing information is obtained and stored in Control VM. Actually routing daemons runs here. Then it is sent to NSX controller, which via permanent SSL encrypted channel push it in each ESXi host kernel. DLR LIFs have same IP addresses and vMACs in each ESXi host. vMACs are not visible in physical network. Fig. 1 shows all DLR components and relation between them. If we move routing in the cloud using NSX ESRs, physical routers and layer 3 switches will be offloaded, but traffic between hypervisors won’t be reduced. Virtual router (ESR) lives on a singles host, which means that VM traffic will reach that hypervisor via layer 2 network, will be routed, and goes back to host where VM is. With DLR netcpa process on each host stores routing information and when communication virtual instances are on the same host no network traffic is sent in the physical network. It’s always easy to put all VMs in a single port group and network and get rid of routing at all. This design is possible if there no requirements for security. Separating VMs in different networks will give us full control over exchanged traffic. B. Traffic optimization using vDS features: shaping, NIOC, reservation and limits Backbone of each cloud is virtual switch, where all VMs are connected. With distributed virtual switch (vDS) we can optimize traffic using following features: Shaping – a shaper could be applied on each port group; Network IO control (NIOC). NIOC concept revolves around resource pools that are similar in many ways to the ones already existing for CPU and Memory. NIOC classifies traffic into six predefined resource pools as follows: vMotion, iSCSI, Fault tolerant (FT) logging, Management, NFS, Virtual machine traffic. With this feature each traffic type is configured with shares. In case of host physical NIC congestion traffic will receive bandwidth that is equal to available physical bandwidth multiplied by ration of traffic shares and sum of all traffic shares of traffic participating during congestion. There are 3 layers in NIOC (Fig 2): teaming policy, shaper and scheduler [9]. There is a new method of teaming called LBT (Load base teaming). It basically detect how busy those physical network interfaces are, then it will move the flows to different cards. LBT will only move a flow when the mean send or receive utilization on an uplink exceeds 75 percent of capacity over a 30-second period. LBT will not move flows more often than every 30 seconds. There are two attributes (Shares and Limit) one can control over traffic via Resource Allocation. Resource Allocation is controlling base on vDS and only apply to this vDS. It applies on vDS level not on port group or dvUplink level. Shaper is where limits apply. It limits traffic by the class of traffic. Each vDS has it’s own resource pool and resource pool are not shared between vDS. Shares apply to dvUplink Level and each share rates will be calculated base on traffic of each dvUplink. It controls share value of traffic going through this particular dvUplink and make sure share percentage is correct. This concept allows flexible networking capacity partitioning to help users to deal with over commitment when flows compete aggressively for the same resources. Fig.1. DLR configuration distribution [9] MIPRO 2016/DC VIS NIOC ensures traffic isolation so that a given flow will never be allowed to dominate over others, thus preventing drops and undesired jitter. On each traffic type we can configure reservations and limits. 315 Fig. 2 Network I/O Control [10] C. Classification and marking, which prepare traffic for further processing in physical network Usually CoS and QoS are not configured in the cloud. If any, it’s applied on physical network devices, where classification, marking and policing happened. Nowadays VMware network virtualization support CoS and QoS. They are relatively new features that come with vSphere 5.5. Enabling CoS and QoS will allow us to mark the traffic on vDS level and offload physical network. We have to configure physical switches to trust CoS and QoS values. III. USE-CASE SCENARIO: ENERGY MANAGEMENT SYSTEM IN THE CLOUD A. Architecture of the proposed energy management system in the cloud The main architecture proposed is consisting of: message broker for supporting communication between sensing elements and application and storage elements; NoSQL data storage (Data Logger) that stores the raw measurements form sensing devices; Application server that provides platform for running the data manipulation and knowledge extraction scripts; relational DB server for storing processed data; Presentations server that runs software for preparing the data visualisation – tables, graphics, maps; and multiple virtual sensor nodes – each one representing a real measurement unit. The virtual sensor nodes should be a replica of the physical sensor – keeping the data extracted from real sensor and providing interface for its configuration. The servers in the infrastructure communicate with virtual sensors for configuration or data extraction and the synchronization between the physical and virtual sensor is done in isolation to the others with specific protocols for the sensor . The logical topology used is shown on Fig. 3. 316 Fig. 3. Logical Topology MIPRO 2016/DC VIS B. Private cloud in Technical university Sofia, branch Plovdiv For the implementation of energy management system a private cloud is built in Technical University – Sofia, branch Plovdiv. It utilizes five hosts with total CPU 83 GHz, total RAM 182GB, total storage >6TB (4 Dell PowerEdge 1950 hosts and one HP ProLiant D80 G7). The hosts are interconnected via two Cisco 3750 Gigabit access switches and one top-of-rack Datacenter switch AS4600-54T-C (with Cumulus Linux), and the they use Fujitsu Eternus DX90 S2 Disk storage system connected using two Brocade Fibre Channel DS-5000B switches. All hosts run ESXi 5.5 operating system, organised in one cluster and are registered in one vCenter. All management services run on separate cluster. Physical topology of our cloud architecture is shown on Fig. 4. The presented cloud infrastructure can provide different cloud services to energy measurement system. Apart from obvious IaaS where user can deploy its own virtual machines or PaaS where user can use its own preconfigured instances of a message broker, data storage and virtual devices, the following specific cloud services are used: • • • Data-as-a-Service – users can extract data from Data Logger or RDB to apply own algorithms for analyses; Sensing-as-a-Service – users can extract Raw data from sensor measurements; Sensor-as-a-Service – users can use the existing virtual sensor infrastructure to apply monitoring and control functions. IV. TRAFFIC OPTIMIZATION IN THE PRIVATE CLOUD A. Traffic types and classes in the use case scenario In our private cloud we recognize following types of traffic: 1. Management – this include all traffic generated by virtual infrastructure: virtual machine migration (vMotion), connection to the network storage (iSCSI, NFS), fault tolerance, user interface for management (WebGUI, PowerCLI), etc. 2. Virtual Machine own traffic – this is the traffic generated from all VMs connected to VLAN based port groups in the distributed virtual switches. 3. VXLAN traffic – this include traffic generated from all Virtual Machines connected to virtual logical switches. Within VM traffic we have following classes: • Reading sensors; • Configure sensors; • Store measurements; • Extract data series for manipulation; • Store processed data in RDB; • Extract data for presentation; • Export data. The first two classes are between the message broker node and virtual devices. The third class is between message broker and Data Logger. The forth class is between Data Logger and Application Server. The firth class is between Application server and RDB server. The sixth class is between RDB and presentation server. The seventh class is between data centre and external Internet services. B. Preliminary results and discussion The applied optimizations in the private cloud in technical university include: Fig. 4. Physical topology MIPRO 2016/DC VIS • NIOC is enabled on vDS that span over hosts running virtual services of the energy management system. We have added custom defined network resource pool for VXLAN and configured it with 100 shares and no limit. This will guarantee that in case of congestion VXLAN traffic will have enough bandwidth. Here we omit any QoS settings, because we want outer VXLAN QoS to match original traffic. All management traffic is placed in system resource pools with default shares – 50 shares per pool. • DLR is used to route traffic between different DMZ zones. If two heavy communicating VMs are on the same physical host the traffic do not leave the host and physical network is offloaded. • DRS Affinity Rules are added on vCenter to keep VMs that have to exchange large amount of data between each other on the same host. • CoS and QoS rules are added on the vDS VXLAN port-group to prepare the traffic for prioritization in the physical networking. We 317 create exact rules that select traffic between VМs, by source/destination IP and protocol ports. Applying traffic shaping is for entire portgroup. Having this in mind we group VМs in appropriate ones. Some initial tests are done for proving the workability of the system. On the physical cluster, several Virtual Machines with Ubuntu 14.04 server are started. They represent different services, specific to smart metering: NoSQL data store or Data Logger; Application server for data manipulation and knowledge extracting; relational DB store for storing extracted data series and/or data patterns; Presentation server for data visualization. Virtual devices are represented by virtual machines running Ubuntu Snappy. Additional virtual devices can be deployed on emulated ARM processors using QEMU. These VMs are configured in different subnets according to the logical topology (Fig. 3). Configuration of the DLR on all VMs is the same. The routing table of one VM is shown on Fig. 5. On each VM several LIFs are configured and multicast group is assigned to each: which ease administration, adds scalability and reliability, and is general enough to be applied in wide range of scenarios. The preliminary results shows that using DLR for routing between VXLAN reduces the overall traffic in the cluster network and applying NIOC lowers the ratio of dropped and delayed packets due to high network loads. The results are preliminary and more detailed experiments should be made to check the influence of prioritization of one traffic to others and to obtain detailed numerical values for the reduction of traffic and delay. These results can be verified in different scenarios with different distribution and load of traffic classes. ACKNOWLEDGMENT The presented work is supported by the National Science Fund of Bulgaria project “Investigation of methods and tools for application of cloud technologies in the measurement and control in the power system” under contract Е02/12 (http://dsnet.tu-plovdiv.bg/energy/). REFERENCES [1] TABLE I. DESCRIPTION OF A LIF IN VMS Mode Routing, Distributed, Internal ID Vxlan:100005 IP 172.17.7.254 Connected Dvs dglabs VXLAN Multicast IP 239.0.0.5 DHCP Relay Server List: 172.17.0.17 For testing the optimization traffic generators are started on different VMs and some preliminary results are obtained. The tests gather information about delay and jitter of VXLAN traffic when source and sink run on same ESXi host or on different ones. These values are compared to check if DLR works as expected. TABLE II. DELAY FOR ONE FLOW BETWEEN TWO VMS* min max avg One host 274.2 283.0 274.3 Separate hosts 836.8 842.5 837.0 * Results are in microseconds (μs) V. CONLUSION The paper discusses the traffic optimization techniques that can be used in a cloud infrastructure with virtual networking (VXLAN). These techniques are applied in a use case scenario – an architecture for energy management system in a cloud. The presented architecture is implemented in a private cloud infrastructure in Technical University of Sofia, Plovdiv branch in virtual laboratory for Distributed Systems and Networking (http://dsnet-tu-plovdiv.bg). Alamri A, Ansari WS, Hassan MM, Hossain MS, Alelaiwi A, Hossain MA, "A survey on sensor-cloud: architecture, applications, and approaches," International Journal of Distributed Sensor Networks, Article ID 917923, Feb 2013, doi:10.1155/2013/917923, 2013. [2] Beng, L., “Sensor cloud: Towards sensor-enabled cloud services,” Intelligent Systems Center Nanyang Technological University, 2009. [3] Botta, A.; de Donato, W.; Persico, V.; Pescape, A., "On the Integration of Cloud Computing and Internet of Things," in Future Internet of Things and Cloud (FiCloud), 2014 International Conference on , pp.23-30, 27-29 Aug. 2014. [4] Zaslavsky, A., C. Perera, D. Georgakopoulos, “Sensing as a service and big data”, Proceedings of the International Conference on Advances in Cloud Computing (ACC), Bangalore, India, July, 2012, Pages 21-29 (8), arXiv preprint (1301.0159). [5] Rao, B.B.P.; Saluia, P.; Sharma, N.; Mittal, A.; Sharma, S.V., "Cloud computing for Internet of Things & sensing based applications," in Sensing Technology (ICST), 2012 Sixth International Conference on , pp.374-380, 18-21 Dec. 2012. [6] Wu L., G. Kaiser, C. Rudin, R. Anderson, “Data Quality Assurance and Performance Measurement of Data Mining for Preventive Maintenance of Power Grid”, Columbia University Academic Commons, 2011, http://hdl.handle.net/10022/AC:P:12174. [7] P. Prakash, “Data concentrators: The core of energy and data management”, Texas Instruments white paper, Copyright © 2013, Texas Instruments Incorporated. [8] VMware NSX Technical Product Management Team, "VMware® NSX for vSphere Network Virtualization Design Guide ver 3.0", Aug 21, 2014 , online [04.02.2016]:www.vmware.com/files/pdf/products/nsx/vmw-nsxnetwork-virtualization-design-guide.pdf [9] B. Hedlund, "Distributed virtual and physical routing in VMware NSX for vSphere", The Network Virtualization Blog, November 25, 2013. online [04.02.2016]: https://blogs.vmware.com/networkvirtualization/2013/11/distribut ed-virtual-and-physical-routing-in-vmware-nsx-for-vsphere.html [10] V. Deshpande, "Network I/O Control (NIOC) Architecture – Old and New", VMware vSphere Blog, Posted on January 25, 2013, online [01.02.2016]: https://blogs.vmware.com/vsphere/tag/nioc Proposed implementation allows building all layer of a energy management system using cloud technologies 318 MIPRO 2016/DC VIS GLOSSARY VXLAN (Virtual Extensible LAN) – A Virtual Network that emulates an Ethernet broadcast domain NSX - VMware NSX is the network virtualization platform for the Software-Defined Data Center (SDDC); DRS (Distributed Resource Scheduler) – technology for balancing the computing capacity by cluster to deliver optimized performance for hosts and virtual machines. VDS – VMware vSphere Distributed Switch (VDS) provides a centralized interface from which you can configure, monitor and administer virtual machine access switching for the entire data center. DLR – Distributed Logical Router. It separates Control and Data plane. Control plane is a VM, but data plane is part of hypervisor kernel vMotion – VMware vSphere live migration allows you to move an entire running virtual machine from one physical server to another, without downtime. Fig. 5. DLR routing table on a host MIPRO 2016/DC VIS 319 Improving Data Locality for NUMA-Agnostic Numerical Libraries P. Zinterhof University Salzburg/Dept. of Computer Sciences, Salzburg, Austria peter.zinterhof3@sbg.ac.at Abstract – Modern multi-socket servers are frequently based on the NUMA paradigm, which offers scalability but also introduces potential challenges. Many numerical libraries have not been designed for such architectures specifically, so their node-level performance relies heavily on the inherent quality of their parallelization (e.g. OpenMP) and the use of highly specialized tools and techniques such as thread-pinning and memory page placement. In this paper we propose a simple and portable framework for improving performance of NUMA-agnostic numerical libraries by controlling not only the location of threads but also the location of allocated memory. In addition to many related approaches we also apply fine-grained control over memory placement that is implemented by means of the operating system’s first-touch principle. I. INTRODUCTION We see a steady trend towards ever increasing performance on all levels of computational hardware. It is a well-established fact, that with current silicon-based technology the performance of typical microprocessors cannot be increased any more just by increasing their clock rates. This is mainly owed to the fact that doubling the clock rate roughly equals the quadrupling of power consumption. In such a setting proper cooling quickly becomes either technologically unviable, as the resulting energy density on the surface of the chips would require very advanced and expensive kryo-techniques or would be outlawed by simple terms of economic rationality. The latter especially being the case in the area of supercomputing where power-consumption of the installation is already one of the main limiting factors. Instead, performance increases are to be achieved by several design factors that are being selectively combined by hardware designers. On the core-level one sees larger cache memories and more powerful instructions that rely on advanced techniques (e.g. speculative execution, instruction re-ordering, etc.) and wide execution units for vector processing. On the chip level we observe ever increasing numbers of parallel cores, located within a single die and connected to each other by caches or special bus systems. This level also marks the beginning of the non-uniform memory architecture (NUMA). On the system-level multiple CPUs are integrated into single systems, or, as in the case of large NUMA-machines, 320 multiple systems seamlessly integrate into a single, potentially very large system that is operated under a single system-wide operating system kernel. Each level offers performance advantages but also potential pitfalls. In this paper we focus on performance optimization on cache coherent NUMA-systems, as they constitute the main building blocks of many current high-performance computers and compute clusters. Certainly, there is a wide range of computational workloads on cluster installations, expressed by an even greater range of software applications. It is our conjecture, that many of these applications are based on external numerical libraries, which in some cases might still be amenable to improvements. Despite much work in the field and a growing awareness of the problems in achieving high performance, software development and performance tuning can be still tricky, especially when knowledge, expertise, or proper tools are scarcely available. It is the goal of this paper to demonstrate an easy path to performance improvements in the use of numerical libraries. That is, we take a zero-knowledge approach with regards to the concepts and implementation details of these libraries, but concentrate on the orchestration of working threads and memory allocations, both of which can be controlled to some extent at the level below the actual library functions. We will base our work on the well-known basic linear algebra subroutines-library (BLAS) that is a tremendously successful numerical library, important to many fields of research as well as application design. The BLAS-library already supports symmetric multi-processors by means of OpenMP, but other than that is agnostic to the special characteristics of NUMA-based systems. Nevertheless, there do exist tools for thread-pinning that allow certain threads of the parallel execution to be pinned to certain cores of the CPUs. Henceforth, the allocation of working memory for BLAS-operations and the distribution of it within a system comprising several NUMA-nodes is either left to the application or to some allocation policy that is defined at the start of the application. Therefore the proposed framework aims at the proper allocation of memory without requiring intricate knowledge of the inner structure and workings of the BLAS-library and at some more detailed level than offered by most available tools. It seems important to note MIPRO 2016/DC VIS that our framework uses BLAS as a test case but is not being limited to this library. II. RELATED WORK A. Research Improving performance for applications on parallel computers is an important area both in research and development. The importance is certainly augmented by the increasing number of high-performance machines worldwide and the fact that almost every new system today is based on multi-core CPUs. One approach to optimize performance is to find optimal placements for multiple threads onto a multi-core CPU. In [1] this placement is guided by hardware performance counters, that have been introduced to modern x86-based CPUs. The proposed autopin-tool[1] is probing several different settings of thread-scheduling until an optimum is declared and being used for scheduling of the application for the remaining runtime. A similar “on-the-fly”-optimization technique will be shown in our framework, too. It has also been shown [2] that there often is no globally optimal thread placement in real-life applications, as for different parallel execution blocks in the code different – locally adjusted – scheduling will be superior. This local adjustment is also the main path we pursue in our proposed framework, although we concentrate on the placement of memory pages in main memory in conjunction with the standard thread placement that is governed by OpenMP. Another interesting feature of the approach described in [2] is the operation on binary executable files instead of the source code of the application. This certainly allows for an easier deployment and possibly wider adoption among users, as no source code has to be available and no recompilation takes place. As pointed out in [3], improving data-locality is only one – albeit important – factor with respect to performance. There are situations in which an imbalance in the layout of pages will not hurt performance. In special cases, one can even rely on an aggregation von L3 caches in multi-processor environments to even gain super-linear speedups [4]. On the contrary, even in well-balanced layouts performance might be sub-optimal due to congestion of memory links that introduce additional access latencies. There is no single optimum solution for these kinds of problems, since the answer also involves other aspects such as the tradeoff between time-to-solution vs. speedup and the decision between manual tuning and auto-tuning. Depending on the setting different paths might be regarded to be superior to others. There is also an interesting relation of energy efficiency of a computation and its performance as pointed out in [5], where high cache utilization is enforced by binding threads to cores either by the pthread-API but also by means of the likwid toolset as in our own setting. In general, improvements on data-locality not only support shorter times to solution but also increase energy efficiency and as a direct consequence of the latter even higher performing computer systems become feasible within the same power envelope. B. Frameworks and Tools MIPRO 2016/DC VIS Amongst solutions for thread-pinning and control of memory allocation at the level of libraries there is a range of tools which offer support for these tasks at the system level. This latter approach removes the need of making decisions by the software developer, but enables to exert control over certain aspects of the run-time behavior of an application by the user or system administrator.  Likwid Likwid [6] is a set of tools for performance optimization on NUMA-systems. It can display all kinds of hardware(topology) information, read out hardware performance counters and energy meters. It can also be used to pin threads to cores for the runtime of an application. We will base our experiments (section IV Results) on Likwid.  Taskset Taskset is a command usually included in Linux distributions. It enables the user to retrieve or set processor affinities for a parallel program. An application will only be run on a set of cores that can be specified by some mask. For instance, the command ‘taskset –c 1,2,8,16 application.exe’ will ensure that the 4 parallel threads of applications.exe will not be executed on any other cores that 1,2,8, and 16. It is not entirely clear, whether affinities are fixed during the run time of the job or whether the kernel is still allowed to change affinities of threads within the requested bounds of cores 1,2,8, and 16. If one application is intended to be confined to a single CPU socket or NUMA-node (there is no strict 1:1 mapping between the two terms as there frequently can be found more than one NUMA-node on a single CPU socket) the above question seems irrelevant. Whereas in the case of high performance computing we most likely want to employ all available cores on the machine which would render an affinity set of <1,2,3, .., MaxCore> rather redundant under the assumption that threads would still be allowed to move between cores.  Numactl Numactl is comparable to the taskset command in Linux, but this command adds functionality by allowing to bind memory to specific NUMA-nodes. Also, it offers a more relaxed way of specifying the location of threads by using the coarser term of “NUMA-node” instead of core numbers. With the intention of locking threads to closely connected memory in order to increase data locality, it is sufficient to tell the system that a specific set of threads shall be confined to a certain NUMA-node as all cores within a NUMA-node enjoy the same memory access characteristics.  AutoNUMA AutoNUMA [7] is both a memory and thread placement tool that has been included in the Linux kernel. By keeping track of an application's memory accesses is determines an online strategy for page migration which is 321 then executed by a separate kernel thread. Unfortunately, some distributions need to be patches before using AutoNUMA, thus rendering the software less portable. III. IMPROVING DATA LOCALITY Our framework aims at performance improvements by locally adjusting memory distribution over different NUMA-nodes by exploiting the well-known first-touch policy in Linux. Under first-touch policy memory pages of some newly allocated block of memory will first be mapped to some Zero-page, such that reading from any position of the block will result in bit patterns 0. Not until the first write operation the operating system will decide which NUMA-node the actual page of memory will be physically placed on. Linux will take that NUMA-node to hold the page that is closest to the thread which is making the write. Hence Linux already provides high data locality for the given thread and block of memory under the rationale that the first-calling thread will continue to make frequent use of a memory page afterwards. Furthermore, memory distribution is being controlled in a fine-grained fashion by selectively choosing NUMAnodes for all arrays that will be used by BLAS within a given application program. In general we find two ways of providing and initializing arrays for use with BLAS.   322 Allocation with explicit initialization. Explicit initialization of arrays can be the consequence of the applications ‘natural’ dataflow (e.g. some input data have to be stored in memory before BLAS operations can take place on them) or the personal programming style of the developer. A important factor will be whether this initialization is accomplished by a sequential thread or a pool of parallel threads, especially on a system employing first-touch policy. A sequential thread will typically force all pages to be physically located at its local NUMA-node, until this node will be exhausted and follow-up pages will ‘spill over’ onto neighboring NUMAnodes. Initialization within a 'parallel'-block in OpenMP will lead to a balance of the distribution of pages over a certain number of NUMA-nodes, as OpenMP always aims to provide a balanced distribution of tasks. Allocation without initialization. When no explicit initialization precedes the use of some memory block or array by BLAS, the operating system will choose some NUMA-node on the basis of the system-wide placement policy. This may be either the above mentioned first-touch or the round-robin policy, depending on Linux kernel settings. In a typical application we do not see a single call to some library function but a series of calls, often embedded in a dynamic control flow. Here one can encounter a series of different library functions repeatedly operating on the same block of memory, with each function potentially sporting a different memory placement for its optimal performance, but as the placement of memory is done only once during the firsttouch phase this placement might prove inadequate for some of the subsequently invoked library functions. This can affect performance negatively as a high degree of data-locality cannot always be ensured in such a setting. In order to deal with that problem, our framework not only enables control of memory placement at the level of single pages in a multi-threaded (OpenMP) environment but also allows changes to that placement dynamically, albeit not in an automatic fashion. Control of placement is based on the definition of the topology of the placement, followed by a function which employs a set of parallel threads to force the physical mapping of pages onto the NUMA-node targets (Fig. 1). This mapping has to be accomplished by way of parallel threads in order to exploit the first-touch policy, which states that a page will be physically stored on that NUMA-node of which the calling thread is part of during the first write operation to that page. The placement is defined as a vector with each element denoting the identification of the thread which is to execute the first touch on the corresponding memory page. We base our experiments on a Linux kernel with a page size of 4096 bytes. Function 'touch' (Fig. 1) applies some write operation to array ‘memblock’ at memory position i*4096, thus forcing ownership of page i to the NUMA-node that corresponds to the calling thread ‘TID’. void placePages (placementVector, memblock) { #pragma omp parallel private (i, TID) { TID=omp_get_thread_num(); for (i=0; i < number_of_pages; i++) { if (TID == placementVector[i]) touch (memblock, i * 4096); } } } Figure OpenMP based page placement By freeing and1:reallocation of some memory block the placement can be changed at will during runtime. This operation could be substituted with the page migration function of the NUMA library, but with maximum portability in mind the above approach has been favored. The operation imposes very little overhead as all memory use in a NUMA-system will have to go through some initialization anyway. The only overhead is given by the MIPRO 2016/DC VIS parallel for loop, which seems rather negligible by never exceeding 8 ms wall time in our experiments. Three methods of deriving proper placement vectors have been employed.    random placement genetic optimization1 Monte-Carlo based random search The “random placement” method places pages on randomly chosen NUMA-nodes. It works surprisingly well (see section IV. Results). By subjecting the placement of pages to optimization by means of a genetic algorithm we also aim to increase data-locality and overall system performance. The genetic algorithm is based on a simulated population of genetic individuals that encode different entries of the placement vectors. By selectively recombining and permuting individuals under evolutionary pressure the quality of the placement will be improved from generation to generation. This algorithm is based on the assumption that for each two individuals of the population a clear distinction of their respective performance can be made. As it turns out, pinning threads to specific cores still allows for some variance in execution times that effectively counteracts this necessary distinction and thus hampers the applicability of the genetic algorithm. As a result the genetic algorithm shows similarities to another method which can be called Monte Carlo-based random search. Our framework therefore offers support for performancesampling (based on Monte Carlo method) for an arbitrary number of repetitions of calls to a BLAS-function. Over time this random search typically yields placement vectors of increasing quality. The user has then to provide some stopping criteria that concludes the search operation with the optimal placement vector to be employed for all subsequent calls to that BLAS-function. It has to be noted that the search operation can be integrated within the normal operation of the application (‘on-the-fly’ optimization), so that no serious overhead is incurred. Placement vectors can also be stored and retrieved for later use. IV. RESULTS For assessment we applied the framework to three operations (dgemm, dgesv, dsytrd+dorgtr+dsteqr) of the BLAS library in double precision (64 bit). BLAS functions have been applied to randomized input data and linked against the Intel MKL v11.2 library on a quadsocket SuperMicro system consisting of 4 AMD Opteron 6386 SE CPUs (64 cores total) and 512 GB RAM, which 1 is arranged in the form of 8 NUMA nodes. The likwidpin[6] utility has been built for the employed CentOS 6.3 Linux system and subsequently being used to pin 64 OpenMP threads to the 64 physical cores of the test system. Each experiment has been repeated 10 times and reported numbers denote the average performance or runtime, respectively. The first experimental run involves the solution of a linear system of equations, where the ‘naïve’ setting of extending an otherwise sequential code with the parallelized BLAS routine ‘dgesv’ is compared to the code parallelized with OpenMP and the code parallelized and optimized by the proposed framework. Also, likwid [6] has been used for pinning threads during the latter two trials, leaving the naïve code under the standard regime of Linux both for scheduling threads and placement of memory pages. Table 1 displays various execution times and Fig. 2 displays run times for different problem sizes, ranging from 1024 up to 16384 in steps of 512. Our framework constantly performs better than the other two approaches. Example two is based on a triple call to BLAS which computes eigenvalues and eigenvectors of a dense matrix. The placement strategies 'OpenMP' and 'framework' perform better than naïve (single core) memory allocation and placement along with the lack of thread pinning up to some 30 %. Nevertheless, execution times of 'OpenMP' and 'framework' (Fig. 3) are very close and probably within the margins of measurement errors, so we cannot declare a clear winning scheme in this realm. Table 1: performance for BLAS dgesv function, in seconds of wall time Placement method/ problem size 4096 8192 12800 16384 naive 1.04 7.39 13.69 37.79 OpenMP framework 0.58 4.09 8.66 24.50 0.57 4.05 8.58 23.64 Experiments have been conducted by means of the Genetic Algorithm Utility Library, http://gaul.sourceforge.net with an initial population of 128 individuals under a Darwinian scheme with crossover/mutation and migration probabilites of 0.03/0.09/0.07. MIPRO 2016/DC VIS 323 Figure 4: BLAS dgemm performance Figure 2: solve system of equations, BLAS dgesv Example three is based on matrix/matrix-multiplication by means of the dgemm-function of the BLAS. The implementation is based on a blocking algorithm, which aims at high utilization of the three levels of processor caches in modern CPUs. Thus, dgemm already provides near to optimum performance in its standard incarnation and it should be less amenable to optimizing page placements. Fig. 4 displays performance for dgemm with input matrices A, and B being initialized in parallel by the 64 OpenMP-threads, placements being done by the proposed framework and also by using first-touch based placements as a result of the output of a preceding dgemm operation. So in the latter case we effectively ran two (dummy) dgemm matrix multiplications first. By just employing the first-touch policy within these operations we obtained page placements for the resulting output matrices C1 and C2 that reflect typical placements of output matrices for dgemm operation. Only in the last step we measured performances of the dgemm operation, now being based on C1 and C2 as input matrices. The described setup aims to model the standard data flow for some abstract numerical application which most often bases the input data of a function on the output data of another preceding function. Our proposed framework will lead to an average increase in performance by 0.5 % over the OpenMP-style Figure 3: BLAS eigenvector/values computation 324 initialization and placement while employing first-touch policy yields an average speedup of 0.35 %. Applying Monte Carlo random search on dgemm for 150 steps yields an additional speed up of 1 % (Fig. 5) This may sound very modest, but with this optimization mode being easily integrated into existing algorithms it still can be regarded as a meaningful improvement. Also, offline optimization can be used with the resulting placement vector being retained for later use. V. CONCLUSION AND OUTLOOK Based on the widely known BLAS library we have demonstrated a simple but nevertheless effective framework for optimizing data-locality within numerical libraries that have not specifically been designed for use on Non Uniform Memory Architecture-based parallel systems. Our approach operates at the level of memory allocation and does only require an operating system that is capable of first-touch page placement policy and thread pinning. Thus we regard the proposed framework to be highly portable. Also, no prior knowledge of implementation details of the numerical library is required by the user in order to enjoy the benefits the framework. Figure 5: Monte Carlo optimization over first 150 samples. Validation for iterations [151,300] MIPRO 2016/DC VIS Future work will explore some fused approach, which integrates optimization of both thread placement and page placement as a single objective. This multi-modal optimization can be expected to deliver further improvements of performance of numerical libraries on modern NUMA-systems. [3] [4] [5] REFERENCES [1] [2] T. Klug, M. Ott, J. Weidendorfer, C. Trinitis, „autopin – automated optimization of thread-to-core pinning on multicore systems“, In: P. Stenström (ed.) Transactions on HiPEAC III, LNCS, vol. 6590, pp. 219-235, Springer, Heidelberg, 2011. A. Mazouz, S. Touati, D. Barthou, “Dynamic Thread Pinning for Phase-Based OpenMP Programs”, Euro-Par 2013 Parallel Processing, pp 53-64 , Lecture Notes in Computer Science, 8097, Springer Berlin Heidelberg, 2014. MIPRO 2016/DC VIS [6] [7] F. Gaud et al., “Challenges of Memory Management on Modern NUMA Systems”, Communications of the ACM, pp 59-66, Vol. 58, No.12, 2015. G. Kosec, M. Depolli, A. Rashkovska, R. Trobec, “Superlinear speedup in a local parallel solution of thermo-fluid problems”, OpenMP parallelization of a local PDE Solver”, J. Computers and Structures, pp 30-38, Vol. 133, Pergamon Press, Elmsford, NY, USA, 2014. D. Davidovic, M. Depolli, T. Lipic, K. Skala, R. Trobec, “Energy Efficiency of Parallel Multicore Programs”, Scalable Computing: Practice and Experience, Volume 16, Number 4, pp. 437–448. http://www.scpe.org, 2015. J. Treibig, G. Hager, G. Wellein, “LIKWID: A lightweight performance-oriented tool suite for x86 multicore environments,” http://arxiv.org/abs/1004.4431 J. Corbet, “AutoNUMA: The other approach to NUMA scheduling”, LWN.net, http://lwn.net/Articles/488709/, 2012. 325 Use Case Diagram Based Scenarios Design for a Biomedical Time-Series Analysis Web Platform Alan Jovic*, Davor Kukolja*, Kresimir Jozic** and Mario Cifrek* * University of Zagreb Faculty of Electrical Engineering and Computing, Zagreb, Croatia ** INA - industrija nafte, d.d., Zagreb, Croatia Corresponding author: alan.jovic@fer.hr Abstract - Biomedical time-series analysis deals with detection, classification and prediction of subjects' states and disorders. In this paper, we present requirements and scenarios of use for a novel web platform designed to analyze multivariate heterogeneous biomedical time-series. The scenarios of use are described with the corresponding UML Use Case Diagrams. We also discuss some architectural and technological issues, including parallelization, visualization and feature extraction from biomedical time-series. The goal of this paper is to present what we currently consider as the best approach for design of such a system, which may also be beneficial for similar biomedical software systems. The paper is focused on design and architectural considerations only, as implementation of the complex system has only just begun. I. INTRODUCTION Web and mobile applications development for biomedical services is continuously growing in the healthcare community [1,2]. While the developed software aims either at general population users [2] or at medical professionals for continuous monitoring of patients [3], there is very little interest involved in developing an integrative web platform that could benefit medical professionals, researchers and experienced general users in modeling subjects' states and disorders. The goal of the currently running Croatian Science Foundation research project HRZZ-MULTISAB1 is to develop a high quality system for an all-encompassing analysis of multivariate heterogeneous biomedical timeseries. It is conceived that this system would be used by all interested users, including medical doctors, biomedical engineers, computer scientists, and others, depending on their goal. As the system will be developed as a web platform, the users will be able to access it from afar, which differentiates it from local or hospital-specific software solutions. The problem that the project tackles at its core is the hard problem of efficient biomedical time-series features identification. There have been many efforts put forth to discover and describe both domain-specific and general time-series relevant features [4-6]. The platform that we are currently in the process of developing will offer significant advancements in this respect, as we will consider the use of time-series features there were proven 1 This work has been fully supported by the Croatian Science Foundation under the project number UIP-2014-09-6889. 326 to be effective in a wide range of sciences [6], as well as in specific biomedical signal domains [7]. Once the platform's implementation is completed, we plan to investigate and compare the best feature combinations advocated by domain experts with the best general time-series features and with no-features engineering approach using deep learning approaches [8]. The goal is to achieve as accurate as possible models of subject's states, including various medical disorders. Presently, in this paper, the goal is to describe system requirements for such a web platform and to elucidate the way in which the platform will be built. We focus on the requirements, with a brief overview of the system's architecture and the technologies involved. II. SYSTEM REQUIREMENTS System requirements, which were agreed upon by the entire project working group were the following: • An integrative software solution for the analysis of multivariate heterogeneous biomedical time-series. • Software solution implemented in the form of a web platform, as thin client - fat server. • Software logic layer on the server written in Java programming language. • Interface towards the user implemented with a set of contemporary web development technologies (HTML5, CSS3, TypeScript...). • Multiple input file formats: European data format (EDF) and EDF+, textual format for signals and annotations, images formats. Files that contain metadata (anamnesis, disorder annotations, etc.) • Visualization of signals in 2D (records inspection) and specific body disorders in 3D using graphical hardware. • Biomedical time-series preprocessing, such as signal filtering, R peak detection in ECG series, data transformations in: time, frequency, and timefrequency domains. • Biomedical time-series feature extraction - features would be chosen by: 1) a medical expert system implemented in the platform, which would be based on current medical knowledge from guidelines and MIPRO 2016/DC VIS relevant scientific papers, 2) an expert user, manually; a large number of features need to be supported by the platform, both general and domainspecific biomedical time-series features. • Feature selection methods, classification, regression, and prediction machine learning algorithms should be used to construct accurate subject state models. • Results reporting in contemporary formats. The platform will necessary support the analysis of the following biomedical time-series: • ECG – electrocardiogram – cardiovascular activity • HRV – heart rate variability – a series of RRintervals whose variability is analyzed • EEG – electroencephalogram – brain activity • EMG – electromyogram – muscle activity Additionally, optionally, the platform will support: • electrooculogram (EOG), electroglottogram (EGG), cardiotocogram (CTG), electrocorticogram (ECoG), photoplethysmogram (PPG), pressure (arterial blood pressure - ABP, pulmonary artery pressure - PAP, central venous pressure - CVP), respiration (impedance), oxygen saturation (SpO2), CO2, gait rhythm, galvanic skin resistance, etc. The platform will support multivariate heterogeneous analysis (e.g. ECG + HRV + SpO2). In the implementation phase, the focus will be primarily to demonstrate the working capabilities of the platform on a smaller set of signals and medical disorders. III. UML USE CASE DIAGRAMS FOR SYSTEM REQUIREMENTS VISUALIZATION System requirements have been detailed in the form of UML Use Case Diagrams, v.1.4+ [9], drawn in Astah Community edition tool [10], which present how a user would interact with the system. These diagrams provide a behavioral description of a system's use, without insight into a detailed temporal sequence in which the interaction with the system is performed. Regardless of the absence of a temporal view on the interaction process, all the scenarios of use can be easily identified. This facilitates web platform development. 7. Model construction 8. Reporting Some of these phases may be skipped entirely, depending on the user. Additionally, the 9. diagram, User account, depicts registration, login and profile management. Finally, the 10. diagram, Platform administration, shows the administrative access to the web platform. In the subsequent subsections, we present the diagrams and their detailed explanation. A. Analysis Type Selection Analysis type selection, Fig. 1, is conducted in a way that a user chooses the goal for an analysis that he wants to perform: detection, classification, prediction, or inspection and visualization. He also chooses the data type on which the analysis will be based. The data type may include exclusively biomedical time-series or biomedical time-series with additional domain data (e.g. subject anamnesis, metadata which describes record type, etc.). Biomedical time-series may be univariate by type, which means that there is only a single measured data array available in the record, e.g. the times of heart beats, or multivariate, which means that there are several measured data arrays present in the record. Additionally, multivariate data may he homogeneous, which means that they come from the same source measurement device, e.g. EEG data, or heterogeneous, which means that there are more than one measurement device involved at the same time, e.g. ECG + HRV + systolic ABP. An example of analysis type selection would be classification based on EEG data. B. Scenario Selection Scenario selection, Fig. 2, enables the user to select a predefined scenario, based on previously selected analysis type. It also allows the construction of a completely new scenario through full customization. A list of predefined analysis scenarios with all of the corresponding aspects (used components) will be provided to the user. The user will be able to confirm a scenario or change some of its aspects, such as preprocessing methods, feature extraction methods or model construction methods. When defining a There were eight UML Use Case Diagrams that we identified, which describe the phases of the analysis: 1. Analysis type selection 2. Scenario selection 3. Input data selection 4. Records inspection 5. Records preprocessing 6. Feature extraction MIPRO 2016/DC VIS Figure 1. Analysis type selection. 327 new custom scenario, or changing an existing scenario, the user will be able to select any of the methods for accomplishing his goals. The predefined scenarios will contain features that are to be extracted from signals, which will be selected by the medical expert system specifically designed for this purpose. The medical expert system will be implemented based on current relevant medical guidelines, standards, and relevant medical and biomedical engineering literature. An example of a scenario would be to detect epilepsy in EEG, using wavelet transform modulus maxima and correlation dimension features, and a C4.5 decision tree. Figure 2. Scenario selection. C. Input Data Selection Input data selection, Fig. 3, includes the choice of file extensions (or file formats) that will be uploaded for the analysis. These formats include EDF, EDF+ (with or without annotations), separate textual annotation files (e.g. heart beat annotations), textual raw signal files (without annotations) and image files (e.g. JPEG, TIFF). The user will have to select the appropriate file format in order for file upload to work without errors. Aside from selecting the format, the user also specifies the files that will be uploaded or selects the folder from which all the required files can be found. If there are some specific details related to records loading into platform (e.g. only partial file loading, limitations for some image formats, etc.), these can also be specified during this phase. D. Records Inspection After the records are uploaded into the platform, it will be possible to inspect and visualize (Fig. 4) each loaded record through the platform's graphical user interface. Here, the user will be able to: select specific record segment, select temporal and amplitude scaling of the signals, display some or all of the signal trails (arrays), inspect record headings and domain data (if such information exists). 3D visualization will be enabled for certain specific states for some biomedical signals (e.g. visualization of myocardial infarction location based on ECG). Additionally, the user will be able to annotate some of the available signals or change the existing annotation files and save these changes. E. Records Preprocessing Records preprocessing, Fig. 5, assumes various procedures for signal filtering and data transformations. Signal filtering includes noise filtering (e.g. notch filters for 50 / 60 Hz), baseline wandering corrections (e.g. for ECG because of breathing, body movements or electrodes impedance changes), records segmentation into windows of certain width. Window width may be chosen by the user or it may be implicit, determined by the very features that are to be extracted (e.g. a single RR interval for detailed ECG analysis). Other filtering methods include various linear and nonlinear filtering procedures with the goal to obtain higher quality signals. Data transformations will be performed after the filtering procedures and will include various time (e.g. principal component analysis - PCA), frequency (e.g. fast Fourier transform, Hilbert Huang transform), timefrequency (e.g. wavelet transform) and other types of data 328 Figure 3. Input data selection. Figure 4. Records inspection. transformations. The goal of data transformations is to obtain the data in a form from which it is easier to calculate precise domain or general features for describing specific subjects' states. F. Feature Extraction Feature extraction, Fig. 6, is the central step in the analysis of biomedical time-series. The user will already be provided with a list of features in the scenario selection phase. Regardless of the scenario, in the feature extraction phase, it will be possible so select additional features and MIPRO 2016/DC VIS cancel feature extraction execution at any time. The user will be allowed to save the extracted feature vectors in a file for future analysis, if he wants to have the intermediate results recorded. G. Model Construction Model construction, Fig. 7, is a complex step in the analysis of biomedical time-series that takes place after feature extraction and includes various machine learning algorithms for the analysis of the extracted feature vectors. In this step, the user will be able to load the file with feature vectors that were calculated and saved earlier and resume the analysis or continue the analysis from the recently obtained and unrecorded extracted feature vectors. Model construction phase starts with the option to select feature vectors' preprocessing methods. Herein, one can use feature vectors' dimensionality reduction methods such as various transformations (e.g. PCA) or feature selection methods (filters, wrappers, embedded, and other methods [11]). Figure 5. Records preprocessing. Dimensionality reduction is necessary in the case where a large number of features were extracted (either by the choice of the expert system or by the user's selection). The goal of dimensionality reduction is to keep only relevant and non-redundant features in order to improve speed and accuracy of model construction. Aside from feature space dimensionality reduction, it will be possible to use some methods to alter the feature vectors themselves (e.g. resampling, missing values replacement, etc.). Figure 6. Feature extraction. specify parameters for calculation of a particular feature, for those features that are parametric. When starting feature extraction, the user will be able to inspect the information about the estimation of the analysis duration and will be able to modify the type of calculation parallelization that will be performed. It will be possible to After the feature vectors preprocessing step, the user will be able to select a model construction method. During this step, first a method is selected and then, the method parameters are defined. Method parameters selection will be made possible if the modeling method is parametric (e.g. C4.5 decision tree, random forest). Before starting model construction, it is necessary to specify the learning method that will be used: holdout Figure 7. Model construction. MIPRO 2016/DC VIS 329 procedure, where a part of dataset is used training and the other part for testing, or cross-validation, where testing is performed on dataset segments so that all the dataset is eventually used for testing. It will also be possible to perform only model testing on a set of new samples, if there is a model present in the system that was trained earlier on the same feature set. When starting model construction, the user will be able to inspect the information regarding the estimation of the construction duration and will be able to modify the type of computing parallelization that will be performed. The construction procedure could be cancelled at any time. After model construction, it will be made possible to save the trained model into a file for later use. Figure 8. Reporting. H. Reporting Reporting, depicted in Fig. 8, is the step in which a user chooses the way in which the results of the analysis are displayed. Thereafter, the report is displayed in the selected form (on a web page only, in PDF, MS Excel, Word or ODF form). It will be possible to save the copy of the report onto the client's computer, whereas its copy will be stored on the server. Additionally, the user will be able to grant access to the report to other registered users, at his convenience. I. User Account An account for a new user will be opened by registering the user onto the platform, which will include entering the data about the user (necessary data: user name, password and e-mail address; additional data: first and last name, title, affiliation, affiliation's address, telephone), Fig. 9. Logging into the system will be allowed after confirmation that the registration was accepted. Profile management will be enabled after a successful login. It will include editing personal information, accessing generated (finished) reports that are linked to the user or resuming the last analysis conducted by the user. Platform Administration Platform administrator will have to login into the platform in the same way as the user, by entering his user name and password, Fig. 10. The administrator will have at his disposal a number of options. First, he will be able to edit his personal profile. He will also be able to manage user accounts for all the platform's users. This will include opening a new user account (as if a user has registered), confirming the registration of a new user, and removing an existing user account. For each account, administrator will be able to inspect the user's personal information (except password), without the possibility of modification. Figure 9. User account. J. The administrator will be able to inspect and search all of the reports generated in the platform. Also, he will be allowed access to all of the other files stored in the system, including all the saved input files, all the feature vectors files, and all the constructed models file. He will be able to delete them at his convenience. Moreover, the administrator will have access to all platform's logs, which will be sorted by date. The logs will contain relevant information about the users' actions during the use of the 330 Figure 10. Platform administration. platform. Additional information about the platform as well as its usage statistics will also be at his disposal. IV. PLATFORM ARCHITECTURE A. Web Technologies System architecture was envisioned as a web portal, which will be accessed by the users through their browsers. This type of architecture was selected, as it is necessary to enable access to the platform from any remote location. With this, the range of potential users is widened, which is important for platform recognition, acceptance, and breadth of applications. MIPRO 2016/DC VIS On the client hand side, during platform development, HTML5, CSS3, and Typescript languages will be used for designing web pages. As aids in developing web pages, Angular 2 framework will be used for programming and Bootstrap framework will be used for better visual experience and platform responsiveness. Also, WebGL technology will be used for 3D visualization of human body sections that are of interest to the user. On the server hand side, Java programming language will be used. This language was chosen because of a large number of existing signal processing libraries, data parsing, implemented machine learning algorithms, ease of web development, and efficient parallelization support. Server side will communicate with the client side through the RESTful protocol. In order to lower the requirements on the server's resources, we will test the use of 1) a standard Java application server with a minimum set of options, and 2) a stand-alone framework that supports the required functionality (e.g. Spring Boot). A minimum set of options will include a user authentication library and an object relational mapping library (e.g. Hibernate). For reporting, JasperReports® Library, an open-source Java library that supports all the required formats (including HTML) will be used. required goal could be determined. Auxiliary packages would contain only signatures (name, parameters) of the available components for a specific analysis phase. The platform will contain a large number of biomedical time-series features, both general and domainspecific, i.e. related to a specific type of time-series (e.g. EEG). In the platform, we plan to use the features from HRVFrame [5], EEGFrame [13] and Comp-Engine [6] frameworks, with appropriate modifications, language translations and testing. Additionally, other domainspecific frameworks, such as the one for ECG analysis will have to be implemented mostly from scratch. V. We have shown the requirements and architecture for an integrative biomedical time-series analysis web platform. Future work will involve implementing the requirements and reporting on the achieved modeling results of the subjects' states. REFERENCES [1] B. Architectural Details The developed architecture should be both modular and scalable. Its modularity should enable the development of various add-ons, while its scalability should support the concurrent work of many users. The platform should also support two important aspects of contemporary software: 1) parallelization and 2) dynamic module loading. [2] Parallelization will be used to accelerate calculations, which is especially efficient for the algorithms that can be divided into independent execution sections, without the need for synchronization. Parallelization will be enabled in two phases of the analysis: for feature extraction and for model construction, because these phase are the most computationally demanding. We may also consider employing it for records preprocessing, if deemed necessary. In any case, the goal will be to maximize the resource capacity on the server with respect to the number of available CPU cores, as well as employing general purpose GPU parallelization. GPU parallelization will be achieved using JCuda or jocl frameworks, depending on the available hardware. The platform will evaluate the available hardware and the number of running processes in order to enable maximum support for a user's demands. [5] Dynamic module loading will be a useful feature of the platform, which is significant from the perspective of server resources' optimization. If, for example, a user wants to classify ECGs using C4.5 decision trees, without any additional algorithms, then there will be no need to load all the other platform modules into memory. The modules will be implemented so that they will have maximum cohesion of services within the module, while retaining minimum coupling between other similar modules [12]. This will be enabled through an early definition of analysis scenario (see section III. B). The exact sequence of module execution to achieve the MIPRO 2016/DC VIS CONCLUSION [3] [4] [6] [7] [8] [9] [10] [11] [12] [13] J. Oster, J. Behar, R. Colloca, Q. Li, Q. Li, and G. D. Clifford, “Open source Java-based ECG analysis software and Android app for atrial fibrillation screening,” in: Computing in Cardiology Conference (CinC) 2013, IEEE, pp. 731–734, 2013. A. Szczepanski and K. Saeed, “A Mobile Device System for Early Warning of ECG Anomalies,” Sensors, vol. 14, pp. 11031–11044, 2014. G. Paliwal, A. W. Kiwelekar, “A Product Line Architecture for Mobile Patient Monitoring System,” in: Mobile Health, Vol. 5 of Springer Series in Bio-/Neuroinformatics, pp. 489–511, 2015, doi: 10.1007/978-3-319-12817-7_22. A. Bravi, A. Longtin, and A. J. E. Seely, “Review and classification of variability analysis techniques with clinical applications,” BioMed. Eng. OnLine, vol. 10, p. 90, Oct. 2011. A. Jovic, N. Bogunovic, and M. Cupic, “Extension and Detailed Overview of the HRVFrame Framework for Heart Rate Variability Analysis,” in: Proceedings of the Eurocon 2013 Conference, I. Kuzle, T. Capuder, and H. Pandzic, Eds. Zagreb: IEEE Press, pp. 1757–1763, 2013. B. D. Fulcher, M. A. Little, and N. S. Jones, “Highly comparative time-series analysis: the empirical structure of time series and their methods,” J. Roy. Soc. Interface, vol. 10, p. 20130048, April 2013. A. Jovic and N. Bogunovic, “Evaluating and Comparing Performance of Feature Combinations of Heart Rate Variability Measures for Cardiac Rhythm Classification,” Biomed. Signal Process. Control, vol. 7 no. 3, pp. 245–255, May 2012. I. Goodfellow, Y. Bengio, A. Courville, Deep Learning, Book in preparation for MIT Press, 2016, url: http://goodfeli.github.io/dlbook/. Object Management Group, “Unified Modeling Language® (UML®) Resource Page”, http://www.uml.org/, last accessed on: 2016-02-21. Change Vision Inc., “Astah, Community Edition,” http://astah.net/editions/community, last accessed on: 2016-02-21. A. Jovic, K. Brkic, N. Bogunovic, “A review of feature selection methods with applications,” in: Proceedings of the MIPRO 2015 Conference, P. Biljanovic, Ed., Rijeka: MIPRO Croatian Society, 2015, Rijeka: MIPRO Croatian Society, pp. 1447–1452, 2015. K. Praditwong, M. Harman, X. Yao, “Software Module Clustering as a Multi-Objective Search Problem,” IEEE Trans. Software Eng., vol. 37, no. 2, pp. 264–282, April 2011. A. Jovic, L. Suc, and N. Bogunovic, “Feature extraction from electroencephalographic records using EEGFrame framework,” in: Proceedings of the MIPRO 2013 Conference, P. Biljanović, Ed., Rijeka: MIPRO Croatian Society, pp. 1237–1242, 2013. 331 1 Augmented Reality for Substation Automation by Utilizing IEC 61850 Communication M. Antonijević*, S. Sučić* and H. Keserica * * Končar-Power Plant and Electric Traction Engineering Inc, Zagreb, Croatia {miro.antonijevic; stjepan.sucic; hrvoje.keserica}@koncar-ket.hr Abstract - IEC 61850 standard represents the most commonly used communication technology for new substation automation projects. Despite the fact that IEC 61850 provides a semantic data model and a standardized configuration description, these facts are underutilized in substation automation management today. This is specifically illustrated in the data visualization domain where new technologies such as virtual and augmented reality have reached significant maturity levels and have not been used for IEC 61850 system visualization so far. In this paper IEC 61850 features have been combined with augmented reality technologies for providing added value visualization capabilities in substation automation domain. The developed prototype demonstrates proof-of-concept solution for regular substation automation checks and maintenance activities. I. INTRODUCTION Smart Grid automation has introduced significant novelties in different Smart Grid subsystems including, Distributed Energy Resources (DERs), distribution automation (DA) and substation automation systems (SAS). One of the elementary prerequisites for successful automation is unified communication mechanisms that facilitate subsystem remote monitoring and control. International standard IEC 61850 represents one the of the Smart Grid automation pillars by introducing standardized communication principles and semantic description of controlled systems. Existence of standards-compliant semantics that describe power system process data allow new possibilities for developing unforeseen applications used in SAS domain. An example of these application categories are Augmented Reality (AR) applications that can utilize meta-data provided by IEC 61850 standard and directly access process related information in order to provide added value features for SAS management. This paper analyses possibility of using IEC 61850 standard for developing AR applications used for SAS maintenance activities. The paper is organized as follows. The following chapter provides an overview of AR technology usage in industrial automation systems. The third chapter describes main features of IEC 61850 standard and AR technologies in order to give an overview of possible use in SAS environment. The fourth chapter describes main functionalities of proposed AR application, its architecture and implementation approach. The fifth chapter provides analysis results of current developed prototype while the last chapter gives a conclusion. 332 II. AUGMENTED REALITY IN INDUSTRIAL SYSTEMS Industrial machines have become advanced tools where automation and advanced feedback is made possible through dedicated control computers. The computers allow us to fully or partially automate complex procedures and can help make manual control more secure and precise. Real-time data from the process is available and many parameters can be interactively controlled through the computer interface. Automation can also allow an operator to monitor multiple machines simultaneously, reducing the number of required personnel for a machine pool. Many critical procedures exist, however, that cannot be completely automated. In such cases, the operator might need to be able to visually follow and interactively control parts of the current operation, while simultaneously monitoring numerous rapidly changing parameters. The maturity of AR technologies has allowed its usage in different industrial automation environments. AR technologies have been successfully applied in manufacturing [1], smart building management [2], automotive and aerospace industries [3]. The AR application in SAS environment have already been used in [4] and [5]. Hoverer, in both articles AR is used for simulation purposes and operator training. Neither of aforementioned articles deals with process related SCADA used for SAS maintenance purposes as proposed in this paper. III. IEC 61850 AND AUGMENTED REALITY IEC 61850 [6] is often regarded as just another remote control protocol for electric utilities, but this international standard is more than a set of rules and encodings for data retrieval from field devices. IEC 61850 defines automation architecture requirements for utility subsystems in order to enable communication and semantic interoperability among multi-vendor equipment. Despite being primarily developed for substations [7], IEC 61850 is now extended for wind power plant domain [8], Distributed Energy Resources (DERs) [9] and hydropower plants [10]. Information modeling scope Server 1…n Logical Device (LD) 1…n Logical Node (LN) Application scope Data Set (DS) 1…n 0…n 1…n Data Object (DO) 1 1…n Data Attribute (DA) Functional constraint (FC) 1…n 1…n 1…n  Figure 1 IEC 61850 data class model MIPRO 2016/DC VIS 2 A. IEC 61850 basics 1) Data model Data semantics provided by IEC 61850 are closely related to the functionalities of devices in utility subsystems such as DERs [9]. IEC 61850 data models [11] are based on objectoriented modelling of data relevant to process automation. Figure 1 shows relationships between IEC 61850 data model classes. The top parent class is the Server which represents a physical device, i.e., a device controller. The Server consists of one or more Logical Devices (LDs), i.e., virtual representations of devices intended for supervision, protection or control of automated system. LDs are created by combining several Logical Nodes (LNs) which represent various device functionalities. LNs are a crucial part of IEC 61850 data semantics. 2) Data exchange The ACSI is a novel paradigm, introduced by IEC 61850, for describing data exchange procedures in utility subsystems such as substations [11]. ACSI model classes define abstracted information services used for vertical and horizontal communication among IEC 61850 devices. ACSI is not a protocol but a method to tie IEC 61850 abstract services to application layer protocols such as MMS [12]. These ACSI model classes can be used as standardized information interfaces for devices which are realized as IEC 61850 servers. Thus, any IEC 61850 enabled client software can take full advantage of their remote control. 3) Managing and engineering IEC 61850 systems The engineering process for IEC 61850 systems is based on exchange of XML documents which are formatted according to the System Configuration description Language (SCL) [13]. There are several SCL document types depending if they describe device or the integrated system itself. The engineering process based on SCL document exchange is relatively static and most commonly used for substation automation systems where communication system and electric network topology are predefined. B. Augmented reality – main features AR is a computer technology that augments the real environment through visually represented information. Thus, to find the opportunities for AR applications in industrial systems it is necessary to review the main AR features to suggest the solutions based on AR. The main AR features are as follows:  AR can follow the user's viewpoint by means of a tracking system;  AR can superimpose virtual objects onto the user's view of a real world scene;  AR can render the combined image of virtual objects and a real world scene in real time;  AR can locate virtual objects in a real world scene in correct scale, location and orientation. The control computer often has access to a large amount of process information and we can expect this to increase as the systems become more sophisticated and complex. It is thus important that the data is clearly presented and easily accessible, in order to avoid unnecessarily attentiondemanding interfaces and information overload for the user. MIPRO 2016/DC VIS Today’s control computers typically present their data on a traditional computer display and often use a keyboard/mouse or a touch-screen as input devices. The operator observes the process through the machine’s safety glass, while using a computer to the side for control and feedback. This setup results in divided attention if the operator has the need to both follow the procedure visually and simultaneously monitor important values on the computer display. While this problem can in part be addressed through display placement, it inherently separates the process data from the process itself. Goal of this paper is to provide solution of integrating the process data with the workspace behind the safety glass using AR technology. AR allows combining interactive computer graphics with real objects in a physical environment, such as the workspace of an industrial machine. It is particularly suitable for today’s increasingly complex industrial machine processes as it enables intuitive representation and real-time visualization of relevant information in the right place. There are several situations where it might be advantageous to have the capability of annotating the real process with virtual information. An operator might for example want to indicate or emphasize locations inside the workspace behind the safety glass to a co-worker or a student. The type, dimensions and state of the tool currently in operation may be indicated by a virtual label. Process simulation in the real machine using virtual tools and virtual materials could increase safety through virtual previews of the procedure, and provide implicit visual warnings from unintentional geometrical inconsistencies. 1) The ideal augmented reality interface integrates computer graphics with a real environment seamlessly and without encumbering technology. The long tradition of mobile AR systems has required systems based on head mounted displays (HMD) that involve complex tracking and equipment to be worn by the user such as the Oculus Rift. The popularity of video see-through systems can be attributed to the relative ease of implementation and rapid prototyping possibilities provided through various software libraries, such as ARToolkit [14]. Even spatial AR systems like the ASTOR system [15] have been in research and have provided very good results. The majority of AR applications today are marker-based requiring specific markers for 3D tracking and positioning. In this paper QR (Quick Response) [16] has been used as a marker. 2) Augmented reality markers There are a lot of different types of markers used in AR and although all are applicable in certain situations, one of them, the QR code, is being used a lot more than any of its competitors. Its name is an abbreviation of Quick Response Code and it is a 2D matrix barcode developed by Desno Wave Corporation in 1994 [16], [17]. A QR Code is capable of handling many types of data, such as numeric and alphabetic characters. One pattern can encode up to 7,089 numeric characters or 4,296 alphanumeric characters. Despite a large chunk of data that can be stored in one QR code it can be decoded by a very lightweight smartphone application or a lightweight PC application with camera access, unlike other 2D barcodes which usually require a specific scanner to help decode them. ARToolkit, the library that is used in this project has its 333 3 own markers, but due to QR’s superiority it has been decided to use QR codes instead of AR’s traditional markers. A QR Code is quite similar to an ARToolkit marker in appearance (Figures 2 and 3), but can encode more information than ARToolkit markers. The goal is to use QR codes as superior markers and combine them with AR techniques by using QR Codes as traditional ARToolkit markers. Doing the registration process of traditional markers can be omitted therefore simplifying the procedure. Also encoding large amounts of data enables developing more complex and interesting applications. TABLE 1 ERROR CORRECTION LEVEL OF A QR CODE [18] Level L M H Q TABLE II COMPARISON OF QR CODE WITH FIDUCIAL MARKERS [18] Feature Need to pre-register Model storing Limited number of markers Universality IV. Figure 2 Fiducial markers Alignement pattern Position Area Percentage of codewords that can be restored (Approximation) 7% 15 % 25 % 30 % QR Code No Internet Larger Fiducial Marker Yes Local Smaller Universal barcode Standalone IEC 61850 AR APPLICATION Developing IEC 61850 AR application is based on utilizing IEC 61850 vertical communication, data engineering trough SCL and QR codes as data markers for identifying parts of the SAS equipment according to the IEC 61850 data semantics. The SCL files provide information about semantically annotated process data. This information is correlated with QR markers assigned to SAS interior (transformers, circuit barkers, switches, feeders, etc.). The QR image recognition recognizes markers, the AG projector draws additional information on the maintenance engineer smartphone screen. The process information overlay SAS equipment directly connecting marked equipment with process information in real time available through SCADA system. By integrating AR the SAS maintenance activities can be significantly simplified allowing unambiguous equipment identification, providing real time measurements and eased SAS monitoring. A. Data Area Figure 3 . IEC 61850 QR code Figure 3 shows an example of a QR code and highlights its parts. A QR code’s most important sections are three large square patterns (each contains a small black square with a white border) which are positioned in three corners of the code. They are used to determine the position of the code. Version 2 (and above) expands on this by adding an align square used to align the code after detection. The rest of the code is used to draw a large number of small blocks that encode the data in the code. An additional benefit of this type of encoding is that the code can be reconstructed even if a part of it was damaged or is missing. This is important when scanning the code with a camera because the camera is never going to be positioned perfectly and parts of the code will often be omitted. There are a couple of correction levels of a QR code which can be seen on Table 1. Higher correction levels allow more of the code to be missing with the drawback of increasing the code size. Superiority of a QR code over other fiducial markers can be seen in Table 2 [18]. These are the arguments for using QR codes in SAS AR applications. 334 Application architecture Application architecture includes several components. The smartphone camera captures the QR code and processes the process data identifier based on IEC 61850. The recognized path represents semantically annotated process data according to IEC 61850 communication standard. This data is correlated with the SCADA process data that retrieves the data from the internal SCADA memory database. The SCADA system gathers data from Intelligent Electronic Device (IED) via vertical IEC 61850 communication. The application architecture is shown in Figure 5. B. Implementation approach All substation primary equipment elements have been marked with a QR code which describes the path to the uniquely identified process data according to IEC 61850 standards. Figure 3 shows an example of the QR code for process data representing transformer voltage measurement in phase A (myLD/MMXU1.PhV.phsA.cVal.mag). The QR code library used for generating QR codes is qrcode 5.2.2 [19] which is a simple Python library that can generate a QR code with data that you want the code to have. The data is used for identification information for the entity that the QR code is put on (for example the transformer mentioned above). Using a device with a camera, like a smartphone, QR code is detect first the with ARToolkit libraries and process the code’s contents. Once it is known MIPRO 2016/DC VIS 4 what IEC 61850 process data needs to be displayed the application is connected to a SCADA data exchange interface and ask for the data. The SCADA is then responsible for fetching the data and sending it back to AR application. Once the data has arrived the information is overlaid over the camera feed and the type of overlay will usually depend on the type of data shown. This is also done using ARToolkit libraries for overlaying information. In most cases it will be a simple value and name being overlaid over the marker. After the initial overlay the SCADA is asked to provide real time updates of the data in question as long as the marker is in the camera’s vision. This real-time data is provided by the commercial SCADA system PROZA NET. The updates are then used to update the overlaid image and the result is having real-time data being shown on the camera feed as long as it has the marker in its line of sight. after it has been located in a 3D world it is necessary to decode the information contained in it. A lot of programs used for decoding markers require the user to place the marker exactly in front of the camera to correctly decode the information. Since QR codes are used as ARToolkit markers it cannot be demanded from the user to position the camera in such a way paying attention to the position and the orientation of the camera in regards to the code. Because of that fact QR code must be aligned to facilitate information extraction of the QR code. The code is always projected from one plane on the camera image plane so we can restore its orientation using perspective transformation which we can be written as: where are original coordinates of the QR code and are aligned homogenous coordinates of the same code. The rest of the variables can easily be estimated using the coordinates of the four corners of the code. This perspective transformation can now be used to determine the proper orientation of the code. Now that the QR code is readable by our application it is quite simple to use the information to ask a SCADA system for the data and overlay the data over the QR code using ARToolkit. V. Figure 4 Pattern detection ARToolkit can calculate the percentage of confidence with which the pattern was recognized. If it is above 50% the pattern is treated as a good candidate for detection. If three candidates are detected in a single frame their special relationship is calculated and those that don’t follow the right triangle rule set by QR codes are discarded. Figure 4 shows the special relationships that need to be satisfied. After determining the three position detection patterns ARToolkit’s capabilities can be used to find the fourth corner. Locating the code is a big step in understanding the code and PROTOTYPE EVALUATION Evaluation is based on utilizing the developed application prototype that is applied in a real world substation. The preliminary results show that QR codes represent a good selection as markers for IEC 61850 data semantics. Figure 6 shows the application in action. So far the application is in its infancy, as you can see from the screenshot in Figure 6, but the results are promising enough for us to be confident that it will be highly applicable in real world situations and that the real-time data update will be attractive to end users. SCADA system Substation AR application screen PhV.phsA. cVal.mag = 5kA Smart phone Transformer Figure 5 Application architecture MIPRO 2016/DC VIS 335 5 Figure 6 Screenshot of the application in a real world situation VI. CONCLUSION In this paper it has been shown how AR technologies can be utilized in SAS systems. For the demonstration purpose, an AR application has been developed based on utilizing QR markers in order to correlate semantically annotated process data with real time SCADA data. It has been shown that combining standards-based communication and AR technology can provide valueadded features for substation maintenance and regular inspection and therefore, reducing the amount of required time and effort for utility maintenance engineers. This paper presents solution applied in SAS environments. However, since IEC 61850 semantics covers several new automation domains it can be easily applied for subsystems such as DER, wond power plants, etc. REFERENCES [1] S. K. Ong and A. Y. C. Nee, Virtual and augmented reality applications in manufacturing. Springer Science & Business Media, 2013. [2] L. Romualdo Suzuki, K. Brown, S. Pipes, and J. Ibbotson, Smart building management through augmented reality, in Pervasive Computing and Communications Workshops (PERCOM Workshops), 2014 IEEE International Conference on, 2014, pp. 105–110. [3] H. Regenbrecht, G. Baratoff, and W. Wilke, Augmented reality projects in the automotive and aerospace industries, Comput. Graph. Appl. IEEE, vol. 25, no. 6, pp. 48–56, 2005. 336 [4] T. R. Ribeiro, P. R. J. dos Reis, G. B. Júnior, A. C. de Paiva, A. C. Silva, I. M. O. Maia, and A. S. Araújo, Agito: Virtual reality environment for power systems substations operators training, in Augmented and Virtual Reality, Springer, 2014, pp. 113–123. [5] P. R. J. dos Reis, D. L. G. Junior, A. S. de Araújo, G. B. Júnior, A. C. Silva, and A. C. de Paiva, Visualization of Power Systems Based on Panoramic Augmented Environments, in Augmented and Virtual Reality, Springer, 2014, pp. 175–184. [6] IEC, Communication Networks and Systems in Substations ALL PARTS, Int. Std. IEC 61850-SER ed1.0, 2011. [7] IEC, Communication networks and systems for power utility automation - Part 7-4: Basic communication structure - Compatible logical node classes and data object classes, Int. Std. IEC 61850-7-4, ed2.0, 2010. [8] IEC, Wind turbines – Part 25-2: Communications for monitoring and control of wind power plants – Information models, Int. Std. IEC 61400-25-2 ed1.0, 2006. [9] IEC, Communication networks and systems for power utility automation - Part 7-420: Basic communication structure - Distributed energy resources logical nodes, Int. Std. IEC 61850-7-420 ed1.0, 2009. [10] IEC, Communication Networks and Systems for Power Utility Automation - Part 7-410: Hydroelectric Power Plants Communication for Monitoring and Control, Int. Std. IEC 61850-7-410 ed1.0, 2007. [11] IEC, Communication networks and systems for power utility automation - Part 7-2: Basic information and communication structure Abstract communication service interface (ACSI), Int. Std. IEC 618507-2 ed2.0, 2010. [12] IEC, Communication networks and systems in substations – Part 8-1: Specific Communication Service Mapping (SCSM) – Mappings to MMS (ISO 9506-1 and ISO 9506-2) and to ISO/IEC 88023,IEC Std. IEC 61850-8-1 ed1.0, 2004. [13] IEC, Communication networks and systems for power utility automation - Part 6: Configuration description language for communication in electrical substations related to IEDs, Int. Std. 618506, ed2.0, 2009. [14] Open Source Augmented Reality SDK | ARToolKit.org. [Online]. Available: http://artoolkit.org/. [15] A. Olwal, J. Gustafsson, and C. Lindfors, Spatial augmented reality on industrial CNC-machines, in Electronic Imaging 2008, 2008, pp. 680409–680409–9. [16] QR code, Wikipedia, the free encyclopedia. 21-Feb-2016. [17] K. Ruan and H. Jeong, An augmented reality system using Qr code as marker in android smartphone, in Engineering and Technology (S-CET), 2012 Spring Congress on, 2012, pp. 1–3. [18] T.-W. Kan, C.-H. Teng, and W.-S. Chou, Applying QR code in augmented reality applications, in Proceedings of the 8th International Conference on Virtual Reality Continuum and its Applications in Industry, 2009, pp. 253–257. [19] qrcode 5.2.2 : Python Package Index. [Online]. Available: https://pypi.python.org/pypi/qrcode. MIPRO 2016/DC VIS Innovation of the Campbell Vision Stimulator with the Use of Tablets J. Brozek*, M. Jakes ** and V. Svoboda** * Department of information technologies, University of Pardubice, Pardubice, Czech republic Department of software technologies, University of Pardubice, Pardubice, Czech republic mail@jobro.cz, jakes@asote.cz, svoboda@asote.cz ** Abstract - The article covers three fundamental themes: a) performance solutions using gaming to treat multiple eye defects; in particular - Amblyopia; b) an explanation of the issue and design of the software (including games) which is intended for therapeutic or health purposes; and c) highlighting the modern solutions and the power of software products for the needs of the health sector, in particular in the fields of diagnostics and rehabilitation. The reader can learn basic information about eye diseases and the principles of their treatment, and become acquainted with the reasons why video games are appropriate for rehabilitation. Very important and beneficial for the reader is the section which focuses on a) the differences in the standard software and the healthcare software, b) the high risks associated with defects of software, or even the risk of side effects with the so-called „perfect software”, c) the fact that a major part of software development does not comply with all of the standards. health care cannot be arranged as a standard software, because the risks are too high, and the consequences could be disastrous. Therefore, the design of the software (including games) must be executed exceptionally well and according to valid standards. B. Amblyopia Our solution primarily targets Amblyopia, also known as the lazy eye disease. According to various sources, 34% of the people suffer from this disease. The disease almost exclusively affects only one eye. It can occur without symptoms, or with some visible symptoms, such as strabismus, closing the affected eye, or fear to close the healthy eye. For more information see [1]; [2], [3] or [4]. The article also discusses the advantages of the software solution over other methods of rehabilitation. Most of the paradigms are generally applicable. Familiarity with the principles of this application can thus be interesting even for developers in the relevant areas. I. INTRODUCTION We need to introduce the issue from several different perspectives, such as historical context, scope of the diseases and how to cure them. A. Context of the issue There are 500 million people suffering with one of many eye defects at the present time. The current level of human knowledge shows, that 80% of these defects can be treated. Some can be treated surgically, others using conservative methods. Currently however, the option of conservative treatment (rehabilitation) is dramatically changing. In the past, a rehabilitation option have been severely limited and has had unverifiable results. Modern times, however, brings an ideal device for rehabilitation of eye defects – the tablet. The combination of the intuitive touch input, directly on the visualization unit appears to be ideal; but the level of effectiveness, however, depends on appropriate software. Figure 1: Health and Amblyopic vision The disease occurs in early childhood (during the time of the development of vision). As a result of a physical defect, the brain never learns how to properly use signals from the eye. Over time, the brain can completely suppress signals from the affected eye. So, amblyopia is a mixed disorder of eye and brain. According to the current level of knowledge, treatment options are possible only until about 12 years of age, in a period when the brain is still able to get basic sensory perception. However, it should be noted that some studies suggest that the disease can be partially corrected in later age. Treatment principles for Amblyopia are explained in the one of the following chapters. However, the problem occurs right at the time of the design and implementation of the software. Solutions for MIPRO 2016/DC VIS 337 C. Binocular cooperation Binocular stereoscopic vision (or cooperation), is absolutely necessary for people in the modern world. After the composition of picture from both eyes, the brain is able to project a third dimension. It means, that when only using one eye, it is not possible to define a distance (warning; trying to emulate this state by short time covering one eye does not work due to the experience of the past, and previous knowledge of sizes and distances). Fair binocular cooperation is essential not only for activities, such as cycling, or driving the vehicle, but also for simple hand-eye coordination. The problem with the lack of binocular cooperation arises generally because of: 1) cooperation never being learned, or 2) gradually going blind in one eye. It is possible to say that an age limit for healing Amblyopia is set primarily by the period, when it is possible to teach the brain binocular cooperation of both eyes. Rehabilitation will focus on efforts to practice cooperation; for example hand-eye coordination for which stereoscopic vision is necessary. (E.g. threading beads on a thread.) For more information see [5] or [6]. II. CURENT THREATMENT METHODS Amblyopia is a combined (the eye-brain) disease, so then the treatment must respect that. As a first step it is necessary to perform a check of the eye and, if appropriate, remove or correct its physical inadequacy. This correction rarely needs intervention, and most patients need only glasses. The second phase of treatment lies in the practical application of conservative methods (sometimes referred as rehabilitation). All conservative methods have a common basis. Before therapy starts, the patient’s healthy eye is prevented from seeing, by using a special shield – occluder. This measure, and on the one hand increases the discomfort of the patient, on the other hand, it is a way how to force the use of the affected eye. The brain cannot ignore the signals from the affected eye for long, and moreover must learn to work with the eye. Gradually this leads to sharpened vision. The later phases of treatment require the alternate involvement of both eyes (to avoid its degeneration). In the third phase of the deployment comes the treatment of binocular cooperation. The treatment itself can be accommodated in a variety of ways. During the processing of the work we used primarily [7], [8], [9], [10]. A. Conservative It is completely conservative principle of treatment. It is based in the fact that, in addition to the blinding of a healthy eye no special measures are taken. This is the oldest and least effective type of treatment. It can only be effective in cases of slight damage. Its big advantage is price. The level of discomfort and treatment duration are major disadvantages. exercise takes place so that the patient simply looks for some particular character. The pace of the workout is determined by the patient. It practices substantial stimulation of vision and the need to select information. The slight drawbacks are: 1) quick eye fatigue, 2) it is not training for various distances, and 3) the exercise requires a high degree of concentration. C. A conservative with multimedia, or computer using Modern times allow with a TV, computer or tablet to treat patients by a conservative method, but without the personal risk of being in the physical world. Poor vision can no longer become a cause of accidents and stumbles. Using an occluder and viewing the television or computer reduces risks. On the other hand, narrowing the rehabilitation method restricts the effect of vision stimulation, and loses overlap to motoric and the broader context of eye use (compared with the classic conservative method). In addition, it eliminated the benefit of stimulating the eye for multiple distances. The biggest problem, however, is the very questionable quality applications (and games) and some phenomena while watching TV. The method seems modern and efficient, but already has a number of disadvantages. It is further extended by the inability to adapt the pace of the patient. There is thus a real risk that patient does not improve, or conversely, that there is a deterioration in the condition of the patient. D. CAM Campbell view stimulator, briefly CAM, was developed in the Soviet union round the year 1960 and subsequently was upgraded several times. Actually, it was the first solution that was designed directly by doctors. Because the custom solution presented in this paper is based on the CAM, it will be described in more detail. CAM is a physical (the most times) electromechanical device that consists of three plates. Far away from the patient is a rotating plate with black and white squares or spirals. Closer to the patient is a transparent plate with the image. The closest part is the glass or foil, on which the patient can draw. The principle of treatment can be likened to a smarter coloring book. These coloring pages are on the moving background. During tracing or rendering the eye is strongly stimulated (even more than is stimulated by the view in the real world). An advantage of the solution is also an implicit connection between treatment of Amblyopia and strengthening the link between vision and motorics. CAM (figure 2) is an elderly solution and current technologies allow its significant evolution. B. Search in text One of the old, but still effective methods is considered to be a combination of the blinding of a healthy eye and search in a text (e.g. in the book). The 338 MIPRO 2016/DC VIS requirements of the market (special chapter focusses on market needs). The solution is shown at figure 3. Figure 2: Campbell vision stimulator III. OUR SOLUTION Custom solutions have a historical context and a link to one of the current core developers. The history of evolution illustrated, that the solution is proven. During 10 years there were many steps to the current version and this is first non-medical paper, which explains principles to the IT community. The development started when the brother of one of the developers began to be treated for Amblyopia. Although the CAM secured the fastest guarantee for success, the old electromechanical CAM was available only in the regional hospital, about 25 km away. The treatment has been complicated by distance, or by the need to keep the patient in the hospital for several weeks (the patient was 5 years old). For this reason the first efforts to create a physical copy of a CAM device started. Unfortunately, the requirements in the area of health care in the combination of electronics and mechanics could not be met. The first generation fails. The second generation solution came in 2008. The principle of the devices used a standard LCD monitor, through which the glass was installed. On the glass, it was possible to draw and trace. It was a mere evolution of an electromechanical CAM, which in principle was nothing new, it just ensured greater availability of treatment. The turning point came in the year 2011, i.e. in the period, when tablets in the Czech Republic began to be affordable. The third generation solution has been implemented on a Tablet PC. Dimensions touch control, and availability for individual households changed the face of treatment. The solution was gradually modified and finally in 2013 under the name ANNA (Czech female first name) the technique started to be used in the Czech Republic and neighboring countries. Pardubice Regional Hospital, which has co-worked on the development, still uses this solution. This generation extended the principles with the after-action review system, and “Heal & Play”. This version is very popular among child patients. The last version was released in the year 2015 under the name Anna II, with a version that in addition to the requirements of doctors and patients, also reflects the MIPRO 2016/DC VIS Figure 3: Software solution on tablet The development of a custom solution is fixed, but for a long time without publishing output. The solution won a few prestigious awards, illustrating that the solution really works. The application itself has gained 1st prize in the Business ideas shows consecutively in the years 2014 and 2015. In 2013, it was awarded a special prize by the Dean (head) of faculty. In 2014 the solution received a special prize from the Rector (head of the University) for outstanding achievement in research and innovation. In 2015, our solution was nominated for the forthcoming research prize given by the National Government of the Czech Republic. A. The issue of programming for the needs of the health care Each developer knows or feels intuitively, that the different types of software are subject to various claims; not only in terms of functionality, but also, for example, in testing methodologies, in development methodologies, or even in the restrictions for some of the used algorithms. A demonstration example, made for the need of teaching, or other simple web applications, does not require complex systems design, project management, or structural control. Often enough, the solution works. However, there are also a set of issues, on which human life or health depends. In applications for health care purposes, very strict rules of programming apply. The defect or unexpected behavior of such a program should have disastrous consequences. It is possible to produce relatively simple programs, which should be developed in a few weeks. However, can you afford to take such a risk? A poorly designed program could endanger the health and lives of users. Such a gamble would be on your conscience if you think only of the financial situation, and utilize incorrect software. Solution without standard is not possible to insure. As the result of the major risks are great demands on the software. During the development, a series of standards must be followed, and the correct development procedures respected (for example, a very often applicable solution is similar to the methodology of the unified process). All must be properly documented and for virtually every step you need to create additional documents. This leads to a 339 situation, where the (programing) code has thousands of rows and project documentation has a length of ten thousand rows. Additional ten thousand rows then requires the creation of tests and their results. All of the forms and procedures are standardized. For the design of the software you need to use ISO 90003 [11], for creating health solution ISO13485 [12], ISO testing 17025 [13] and 29119 [14-16], for the proper team management and documentation ISO 21500 [17], 12207 [18] and 9000 [19] and you need to know many others of them. Software development is so immensely more difficult. Although the two applications may look the same, but the cost of solution witch respects all the standards is ten times more, in comparison to a solution that is not limited by the standards. In addition, it should be noted, that standards may limit the methods of programming, or even restrict the allowed algorithms. For example, a pragmatic approach teaches the programmers to use algorithms and data structures from the character of their average (often amortized) complexity. The Splay tree is a data structure with advanced heuristics, the “move to front” operation improves it performances in many classes of application, unlike other forms of search trees. The Quicksort sorting algorithm is often used for its simplicity and speed. Unfortunately, the standards do not allow the solution, which has a very good average complexity, but the scope focuses on its worst-case complexity (and it is many times not so good in advanced data structures). Therefore, we look for a solution in the selection of algorithms according to the ratio of their performance, and taking into account the worst-case situation. Methodology for software testing is described very comprehensively in standards. All testing must be conducted against the protocols, and many characteristics of the device, so that the error was virtually eliminated. Each test must be carried out at several levels and must always fulfil the standard conditions of the request. At the same time, it is necessary to test the substandard conditions, influenced by the external environment, and of course the hazards. The software developer must realize (and document) the following types of tests (minimally): unit tests, integration tests, acceptance tests, factory tests, user side tests, integrity tests, tests the stability of the environment. After completion of the test, records of the tests much be archived for a minimum of a program lifetime (standards usually stored 10 years after the closing of the official aid, and the aid must be provided normally at least two years after end of app distribution). Figure 4: Development demands Successful completion of effective phases of software tests, from the developer’s side, finalizes the development part. The issue does not end there. In the next stage, you should test the software from many other aspects. For example, our solutions have to go through a whole range of display tests. A test of maximum dynamic image changes (high value of dynamic changes of the image would lead to risk of launch epilepsy bout), normal contrast test, specific contrast test (contrast during nonstandard lighting) and many others must be measured. After all of these tests, it is possible, to declare the prototype as ready for first testing on patients. After a further lengthy time, the product is also ready for clinical tests. B. Base principle As well as the CAM, our solution is based on active workplace with homogeneously changing background. Every background is changing at the same rate (usually a rotation around a constant Center of rotation), but with different grain size (resolution) of the background. Background with smaller grain sizes (DPI-the underlying shape) creates a simpler solution for the patient. Background with a higher grit is very challenging. Patients with higher degrees of disabilities are not able to work on the environment with a higher grit. The foreground application varies according to the selected mode. It is possible to choose the mode of the coloring book (thus running mode, which emulates the CAM and brings into the advanced features), or run one of the many games. The principle is based on the actual painting book. Firstly, patient traces the strokes outline with his finger and thereby draw by selected color. After drawing around the outlines the patient can select the color and with finger holding should fill image with the colors. The maximum therapeutic efficiency is achieved during tracing process. The rendering is only a complementary part of the solution to the patient's picture finished. After he finish, he can Save the picture for future seeing of his progress (of his accuracy) or for printing. The coloring pages are equipped with several interesting solutions, such as snapping to lines with a scalable tolerance (which brings a higher success rate and 340 MIPRO 2016/DC VIS higher motivation), after action review system, and techniques for changing the difficulty of painting. C. Heal & Play Heal & Play is a designation used in the Department of software technology of University of Pardubice, where the development is for the health sector using the games. The name is self-explanatory. It is in the process of registration, but there is no further protection. This does not serves as a standard, but for the description of program purpose. All games are usually very simple and use the same principle with dynamic background as CAM. The aim of the games is not provide a comprehensive gaming experience, but to make the treatment has become a pleasant and popular activities, at least enough to it is possible. The benefits of the games for the whole solution are clear; the treatment is becoming, to be popular way of time spending. Thanks to a greater spectrum of the games is to ensure that the player does not start to be bored with them. The determination of the appropriate level of difficulty, length and scaling of variables that affect the effectiveness of the treatment, are currently the most effective solution for the rehabilitation equipment in this category. IV. DISCUSSION Successive experiments, interviews with physicians in the development of cooperation, or from simple feedback from users, it is possible to characterize some of the profiling features of our solution. A. The availability of the solution Free interdisciplinary synthesis of proven principles of the CAM with the programming techniques for Tablet, delivers the benefits of fantastic availability. The tablets are currently expanding strongly in most households, where they are relatively cheap to buy. In addition, the advantage of the use of tablets is that they have a very intuitive, and easily controlled by most people. B. Mobility Unlike the original CAM, a great advantage of our solution is united mobility. Treatment can take place by default; in a treatment facility, or in the comfort of your home, or even while traveling. Thanks to this property there has been a dramatic increase in the satisfaction of the treatment. Especially for small children, it is important that the principles of comfort and safety at home are adhered to. C. The risk of the solution Each tablet is an electronic bench-mounted unit, which is certified for its operation, so it is safe to use in hospitals and homes. The risks associated with software are eliminated thanks to the creation of a solution in accordance with all the standards, thanks to the very high quality of testing. MIPRO 2016/DC VIS D. Snapping The software solution enables you to deploy the routines that significantly facilitate the use of the software, or allow you to scale the difficulty of the individual treatments. An example might be: when a child is asked to trace an outline. The software allows you to set, for example, a 0.5 cm tolerance such that the traced line will perfectly to snap onto the intended target. In a paper book, or CAM, such implementation is not possible. This approach has many advantages. The patient feels a greater degree of progress, the overall appearance of the coloring book is better and has easier Diagnostics. Tolerance, however, should be used with caution, so that the error is progressively reduced, therefore, with a view to enhancing the enforcement of claims on the patient. E. Fun Implementation of games into a certified therapeutic device has produced fantastic results. The children, on which the software is targeted itself, began take a treatment as fun. They themselves repeatedly asked for healing cycle. It removes the problem of kids refusing treatment, because it is fun. It is another benefits of our solution, because it could combine combine the high efficiency of treatment with great fun ratio. Most of the games with this principle are very simple, to do not disturb the heal principle. But they are set, to increase their difficulty. Patient can very easy not succeed when playing games. Just make the games challenging is more important than their high-quality graphic design (graphics design with full textures could very easily degrade the therapeutic effect of the treatment). F. The secondary effects of treatment Using the tablet as an I/O device and display unit, creates certain benefits for the user. Unlike other solutions (e.g. almost all conservative with the use of information technology) already used during the treatment of amblyopia, occurs spontaneously with the strengthening of ties between the vision and the motoric abilities. The binding of the eye and the hand is created in such a pleasant manner that may lead to skiping of some specialized rehabilitation treatments. G. Facilitate the evaluation of the impact of treatment The after action review system is very convenient for patients and doctors. By storing statistics can be very closely monitored the development of treatment for every individual patients. Thanks to this you can set them the appropriate therapeutic schemes, taking into account the needs of each individual patient. The system is also running on separate devices. These are used with longterm medical supervision. In this case, there are automatic processes, which recommend optimal settings according to patient degree of disability. V. CONCLUSION The article introduced the principles that are used to treat Amblyopia. The paper, however, tried to capture enough of the software development process for health care, so that it could be most beneficial for the reader. The 341 reader should obtain information on the issue of the development of solutions for the health sector, the demanded steps and also get some inspiration for their own solution. The entire article is trying to show that through the games on the Tablet you can achieve fantastic results and improve the lives of thousands of people. However, the development of the application, which appears to be relatively simple, is extremely challenging, because it is not easy to meet all the standards. The complexity, however, would not deter developers; since the standards are built so as to protect users and in fact even the developers. [6] [7] [8] [9] [10] [11] ACKNOWLEDGMENT [12] The paper submission is particularly financed from Students grant prize at university of Pardubice. [13] REFERENCES [14] [1] [2] [3] [4] [5] 342 A. Fielder, Amblyopia: a multidisciplinary approach. Oxford: Butterworth-Heinemann, 2001. K. J. Ciuffreda, D. M. Levi and A. Selenow, Amblyopia: basic and clinical aspects. Boston: Butterworth-Heinemann, 1991. J. Osman, Amblyopia. Bolinas, CA: Avenue B, 1993. M. Schapero, Amblyopia, 1st ed., Philadelphia: Chilton Book Co, 1971. D. J. Getz, Strabismus and amblyopia, CA: Optometric Extension Program, 1990. [15] [16] [17] [18] [19] P. H. Spigel, Handbook of Pediatric Strabismus and Amblyopia. New York: Springer Science Business Media, Inc, 2006. S. W. Ambler and L. L. Constantine, The Unified process transition and production phase: best practices in implementing the UP. Lawrence, Kan.: CMP Books, Masters collection (Lawrence, Kan.), 2002. R. Cimler, J. Matyska and V. Sobeslav, Cloud based solution for mobile healthcare application. In: Proceedings of the 18thInternational Database Engineering. New York, New York, USA: ACM Press, 2014. G. Divisova, Strabismus. 2.,. Praha: Avicenum, 1990. P. Kroll and P. Kruchten, The rational unified process made easy: a practitioner's guide to the RUP. Boston: Addison-Wesley, c2003. ISO/IEC 90003:2014, Software engineering -- Guidelines for the application of ISO 9001:2008 to computer software ISO 13485:2003, Medical devices -- Quality management systems -- Requirements for regulatory purposes ISO/IEC 17025:2005, General requirements for the competence of testing and calibration laboratories ISO/IEC/IEEE 29119-1:2013, Software and systems engineering – Software testing -- Part 1: Concepts and definitions ISO/IEC/IEEE 29119-2:2013, Software and systems engineering – Software testing -- Part 2: Test processes ISO/IEC/IEEE 29119-3:2013, Software and systems engineering – Software testing -- Part 3: Test documentation ISO 21500:2012, Guidance on project management ISO/IEC 12207:2008, Systems and software engineering -Software life cycle processes ISO 9000:2005, Quality management systems -- Fundamentals and vocabulary . MIPRO 2016/DC VIS Classification of Scientific Workflows Based on Reproducibility Analysis A. Bánáti1, P. Kacsuk2,3 and M. Kozlovszky1,2 Óbuda University, John von Neumann Faculty of Informatics, Biotech Lab Bécsi str. 96/b., H-1034, Budapest, Hungary 2 MTA SZTAKI, LPDS, Kende str. 13-17, H-1111, Budapest, Hungary 3 University of Westminster, 115 New Cavendish Street, London W1W 6UW {banati.anna, kozlovszky.miklos}@nik.uni-obuda.hu, kacsuk@sztaki.mta.hu 1 Abstract - In the scientist’s community one of the most vital challenges is the reproducibility of a workflow execution. The necessary parameters of the execution (we call them descriptors) can be external which depend on for example the computing infrastructure (grids, clusters and clouds), on third party resources or it can be internal which belong to the code of the workflow such as variables. Consequently, during the process of re-execution these parameters may change or become unavailable and finally they can prevent to reproduce the workflow. However in most cases the lack of the original parameters can be compensated by replacing, evaluating or simulating the value of the descriptors with some extra cost in order to make it reproducible. Our goal in this paper is to classify the scientific workflows based on the method and cost how they can become reproducible. I. INTRODUCTION In large computational challenges scientific workflows have emerged as a widely accepted solution for performing in-silico experiments. In general these in-silico experiments consist of series of particularly data and compute intensive jobs, and in most cases their executions require parallel and distributed infrastructure (super/hypercomputers, grids, clusters, clouds). The successive steps of an experiment are chained to a so called workflow, which can be represented by a directed acyclic graph (DAG). The nodes are so called jobs, which includes the experimental computations based on the input data accessed through their input ports. In addition, these jobs can product output data, which can be forwarded through their output ports to the input port of the next job. The edges of a DAG represent the dataflow between the jobs (Figure 1.). An essential part of the scientific method is to repeat and reproduce the experiments of other scientist and test the outcomes themselves even in a different execution environment. A scientific workflow is reproducible, if it can be re-executed without failures and gives the same result as the first time. In this approach the failures do not mean the failures of the Scientific Workflow Management System (SWfMS) but the correctness and the availability of the inputs, libraries, variables etc. Different users for different purposes may be interested in reproducing the workflow, for example the MIPRO 2016/DC VIS Figure 1. Workflow example with four jobs (J1, J2, J3, J4) authors of the workflow(s) in order to prove their results, readers or other scientists in order to reuse the results or reviewers in order to verify the correctness of the results [1]. Additionally, nowadays scientific workflow repositories are already available and in this way the scientists can share their results with each other and even they can reuse the existing workflows to create new ones. The two most significant obstacles of reproducing a workflow are the dependencies of workflow execution and the rich collection of provenance data. The former can be perceived as the necessary and the latter one as the satisfactory requirements of the reproducibility. The dependencies of the execution mean those resources which require external (out of the scientific workflow management system, SWfMS) services or resources such as third party services, special hardwares/softwares or random value generator [2]. Elimination of these dependencies in most cases is not possible, so they have to be handled in some other way: different methods should be set up to make the workflows reproducible. To achieve our goal we have defined the descriptor space and the decay-parameters of the jobs that give us the possibility to analyze the workflow from a reproducibility perspective. The descriptor space contains all the parameters (call descriptors), which are necessary to reproduce the workflow. There are descriptors, which are constant and do not change in time. Other descriptors are continuously changing (for example a database which continuously get more and more data from sensor networks). Also descriptors based on external services (such as third party services) may exist which can be unavailable after a few years. Finally there are descriptors which are unknown and its behavior is unpredictable. In 343 this case the workflow is non-reproducible. The decayparameter describes the type and the measure of the change of the descriptor. With the help of the decay-parameter we have determined five categories of the workflows: reproducible, reproducible with extra cost, approximately reproducible, reproducible with probability P and nonreproducible. The goal of our investigation to find different methods to make reproducible the workflow in the different categories even if it requires extra costs or compromises. In certain cases this goal is implementable but often the result of the workflow is only evaluable with the help of simulations. If there is no method to make the workflows reproducible, our goal is to provide the scientists with useful information about the conditions and probability of the reproducibility of his workflows. The rest of the paper is organized as follows: In the next section we provide a short background and overview about works related to our research. Section 3 presents the mathematical model of our reproducibility analysis. In section 4 we give the classification of the scientific workflows based on our analysis. In section 5 based on our model we define the general measures of the reproducibility analysis. Finally we summarize our conclusions and reveal the potential future research directions. II. STATE OF THE ART Currently the reproducibility of scientific workflows is a burning question which the scientist community has to face with and has to solve. Accordingly in the latter onetwo years many researchers investigate this issue. One part of the literature analyzes the requirements of reproducibility and the other part deals with the implementation of such tools or frameworks. The first group agree on the importance of the careful design [3], [4], [5], [6], [7] which on one hand means the increased robustness of the scientific code, for example with a modular design and detailed description of the workflow, and of the input and output data examples, and consequent annotations [8]. On the other the careful design includes the careful usage of volatile third party or special local services. In these cases two solutions exist, but reproducibility is uninsurable: 1. taking a digital copy of the entire environment using a system virtual machine/hardware virtualization approach capturing and storing metadata about the code and environment that allows it to be recreated later [8]. Zhao et al. [9] in their paper investigate the cause of the so called workflow decay, which means that year by year the ability and success of the re-execution of any workflow significantly reduces. They examined 92 Taverna workflows submitted in the period from 2007 to 2012 and found four major causes: 1. Missing volatile third party resources 2. Missing example data 3. Missing execution environment (requirement of special local services) and 4. Insufficient descriptions about workflows. Hettne et al. [10] in their paper list ten best practice to prevent the workflow decay. Grothe et al. [11] analyze the characteristic of applications used by workflows and list the requirements in order to enable the reproducibility of results and 344 determination of provenance. To the former mentioned requirements they assumed the deterministic feature of applications in order to perform appropriate provenance collection. There exist available tools, VisTrail, ReproZip or PROB [12], [13], [14], which allow the researcher and scientist to create reproducible workflow. With help of VisTrail [12], [15] reproducible paper can be created, which includes not only the description of scientific experiment, but all the links for input data, applications and visualized output which always harmonizes with the actually applied input data, filter or other parameters. ReproZip [13] is another tool, which stitches together the detailed provenance information and the environmental parameters into a self-contained reproducible package. The Research Object (RO) approach [16], [17] is a new direction in this research field. RO defines an extendable model, which aggregates a number of resources in a core or unit. Namely a workflow template; workflow runs obtained by enacting the workflow template; other artifacts which can be of different kinds; annotations describing the aforementioned elements and their relationships. Accordingly to the RO, the authors in [18] also investigate the requirements of the reproducibility and the required information necessary to achieve it. They created ontologies, which help to uniform these data. These ontologies can help our work and give us a basis to perform our reproducibility analysis and make the workflows reproducible despite their dependencies. Piccolo et al [19] collected the tools and techniques and proposed six strategies which can help the scientist to create reproducible scientific workflows. Santana-Perez et al [20] proposed an alternative approach to reproduce scientific workflows which focused on the equipment of a computational experiment. They have developed an infrastructure-aware approach for computational execution environment conservation and reproducibility based on documenting the components of the infrastructure. To sum up the results mentioned above, we can conclude that the general approach is that the scientist has to create reproducible workflows with careful design, appropriate tools and strategies. But none of them intended to solve the problem related to the dependencies rather they suggested to bypass them. Moreover, they did not deal with the following question: How an existing workflow can be made reproducible? III. THE MODEL In our approach a scientific workflow consisted of N jobs can be written as a function of its job: 𝑆𝑊𝑓(𝐽1 , 𝐽2 , … , 𝐽𝑁 ) = 𝐑 (1) where R is the vector of results. In our investigation we assume, that a given workflow is executed at least one time and the provenance database of the workflow execution is available. In this case we can assign a so called descriptor space to every job of the given workflow. MIPRO 2016/DC VIS 𝐷𝐽𝑖 = {𝑑𝑖1 , 𝑑𝑖2 , … , 𝑑𝑖𝐾𝑖 } (2) The elements of this descriptor space are called descriptors and they give all the necessary parameters to reproduce the job. These parameters can be for example variables of the infrastructure, variables of the code, parameters of system calls, inputs, outputs and partial data or access paths of external resources etc [21]. Every descriptor has a name and a value. In addition we also assign them a so called decay-parameter which describes the type and the measure of the change of the given value. The decay-parameter can be zero, which means that the value of this descriptor is not changing in time, in other word the availability of this descriptor (and its value) can be insured in one, two, ten or any years. In this case this descriptor does not cause dependency and the reproducibility of the job does not depend on this descriptor. The decay parameter can be infinite, if the descriptor’s value is unknown. For example in case of random generated values. The value of the decay-parameter can be a distribution function F(t) if the availability of the given resource varies in time according to this F(t). The fourth option is that the value of the decay parameter is a function – vary(t, x) – depending on time, which determines the variation of the descriptor’s value. Formally: 𝑑𝑒𝑐𝑎𝑦(𝑣𝑖 ) = 𝟎, if the value of the descriptor is not changing in time ∞, if the value of the descriptor is unknown 𝑭𝒊 (𝒕), distribution function of the availability of the given value TABLE 1. Descriptor’s name Descriptor’s value Decay-parameter d1 v1(d1) decay(v1) c1 d2 v2(d2) decay(v2) c2 … … … … dK vK(dK) decay(vK) cK The descriptors and its decay parameters can originate from three different sources: from the users, from the provenance database and it can be automatically generated by the SWfMS. [21] IV. =𝐽𝑂𝐵𝑖 (𝑡0 + ∆𝑡, 𝑣𝑖1 (𝑑𝑖1 ), 𝑣𝑖2 (𝑑𝑖2 ), … , 𝑣𝑖𝐾𝑖 (𝑑𝑖𝐾𝑖 )) = 𝑹𝒊 (4) for every ∆t. In addition if a scientific workflow contains N jobs and the jobs are reproducible, the scientific workflow is also reproducible: 𝑆𝑊𝐹(𝑡0 , 𝐽1 , 𝐽2 , … , 𝐽𝑁 ) = 𝑆𝑊𝐹(𝑡0 + ∆𝑡, 𝐽1 , 𝐽2 , … , 𝐽𝑁 ) = 𝐑 (5) for every ∆t. Also we can assign a cost to the descriptors. This gives the measurement of the “work” or cost which is necessary MIPRO 2016/DC VIS THE CLASSIFICATION Analyzing the decay parameters of the descriptors we can classify the scientific workflows. First, we can separate the workflows which decay-parameters for all the jobs are zero. These workflows are reproducible at any time and any circumstance since they do not have dependencies. Than we can determine those ones which can influence the reproducibility of the workflow in other words which also have non-zero decay parameter(s). Four groups have been created: 1. At least one decay-parameter of the descriptor is infinite, but with the help of additional resources or tools this dependency of execution can be eliminated. In this case the cost of this descriptor indicates that there are possibility to reproduce the job with some extra cost. 2. At least one decay-parameter of the descriptor is infinite and the cost of this descriptor is also infinite. In this case the dependency of the workflow can not be eliminated and the workflow is non-reproducible. 3. At least one decay-parameter of the descriptor is a probability distribution function and the other ones are zero. 4. At least one decay-parameter of the descriptor is a vary function and the other ones are zero. (Table 2.) With the help of these expressions we can define the reproducibility in the following way: Definition: The Ji job is reproducible, if 𝐽𝑂𝐵𝑖 (𝑡0 , 𝑣𝑖1 (𝑑𝑖1 ), 𝑣𝑖2 (𝑑𝑖2 ), … , 𝑣𝑖𝐾𝑖 (𝑑𝑖𝐾𝑖 )) = Cost to make the job reproducible. For example, when the value of the descriptor is a large amount of data which cannot be stored even on extra storage. We can assign a cost to this extra storage. Or another example if the descriptor is changing in time and its decay-parameter is a so called “vary function”. In this case to reproduce this workflow we can apply simulation tools based on the sample set which also result an extra cost (see section IV.A). 𝑽𝒂𝒓𝒚𝒊 (𝒕, 𝒗𝒊 ), if the value of the descriptor is changing { in time (3) THE DESCRIPTOR SPACE OF A JOB AND ITS MEASURES A. Reproducible workflows The first group represents the reproducible workflows. In this case all the decay-parameters of all the jobs belonged to a workflow are zero. These workflows are reproducible and they can be executed and re-executed at any time and any circumstance since they are not influenced by dependencies. B. Reproducible workflow with extra cost There are workflows, which have dependencies and infinite decay-parameters, but the appropriate cost is not infinite. In this case with the help of additional resources or tools these dependencies can be eliminated. For example, if 345 a computation is based on random generated value, this descriptor’s value is unknown (infinite). In this case with the help of an extra, operation system level tool we can capture the return value of the system call and we can save it in the provenance database [22]. The third example is when a virtualization tool, such as a virtual machine have to be applied to reproduce the workflow. C. Approximetly reproducible workflows In certain cases the workflow execution may depend on some continuously changing resource. For example there are continuously growing databases which get the data from sensor networks without intermission. If the computation of a workflow use some statistical parameters of this database, the statistical values never will be the same. In this case the appropriate descriptor’s value of the given job may change on occasion of every re-execution, consequently the reproducibility of this workflow could be failed. If the workflow was executed S times and the provenance database is available, we can create a sample set which contains the S different values of the changing descriptors and the S results of the workflow. In this case we can analyze the change of the descriptor’s value, we can write its function and even, we can determine a general evaluating method of the result. On occasion of a later reexecution, if reproducing is not possible, this evaluating method can be applied and an evaluated result can be done with a given probability [22]. D. Reproducible workflows with a given probability Many investigations revealed the problem caused by volatile third party resources […], when the reproducibility of workflows became uncertain. The third party services or any external resources can be unavailable during the years. If we know this decay of the resources and if we can determine its probability distribution function we can predict the behavior of the workflow on occasion of a reexecution at a later time. Sometime the users may have to know the chance of the reproducibility of their workflow. Assuming that the probability distribution of the third party service is known or assumable we can inform the users about the expected probability of the reproducibility. To formalize the problem, first, we have separated the Mi descriptors of a given job Ji which depend on external or third party resources and its decay-parameter, which is a probability distribution function given as follows: 𝐹𝑖1 (𝑡), 𝐹𝑖2 (𝑡), … , 𝐹𝑖𝑀𝑖 (𝑡). The rest of the descriptors have zero decay-parameter. In this case, at time t0, a given descriptor’s value 𝑣𝑖𝑗 (𝑑𝑖𝑗 ) is available with a given probability (for the sake of the easier comprehensibility hereafter we omitted the i index referred to the ith job of a given scientific workflow): (𝑡 ) (𝑡 ) (𝑡 ) 𝐹1 (𝑡0 ) = 𝑝1 0 , 𝐹2 (𝑡0 ) = 𝑝2 0 , … , 𝐹𝑀 (𝑡0 ) = 𝑝𝑀 0 (6) Let us assign to the job Ji a state vector 𝐲𝒊 = (𝑦𝑖1 , 𝑦𝑖2 , … , 𝑦𝑖𝑀𝑖 ) ∈ {0,1}𝑀𝑖 , in which the 𝑦𝑖𝑗 = 1 , if the jth descriptor of the job Ji is unavailable. In this way the probability of a given yi state vector can be computed as follows: 𝑦𝑗 1−𝑦𝑗 𝑝(𝑦) = ∏𝑀 𝑗=1 𝑝𝑗 (1 − 𝑝𝑗 ) 346 (7) TABLE 2. CLASSIFICATION OF SCIENTIFIC WORKFLOWS decay-parameter cost category decay(v)=0 cost = 0 reproducible decay(v) = ∞ cost = ∞ decay(v) = ∞ cost = C1 decay(v) = F(t) cost = C2 decay(v) = vary(t,v) cost = C3 non-reproducible reproducible with extra cost reproducible with probability P approximately reproducible In addition a time interval can be given during which the descriptor is available with a given probability P. Since we assume the independency of the descriptors the cumulative distribution function of the job Ji can be written as follows: 𝐅𝑖 (𝑡) = ∏𝑀 𝑗=1 𝐹𝑖𝑗 (𝑡) (8) E. Non-reproducible workflows There is no method to make the workflow reproducible. In this case the scientific workflow probably contains non-deterministic job or jobs. V. REPRODUCIBILITY ANALYSIS It may be important to inform the user about the reproducibility of his workflow or even the cost of the reproducibility. Based on our mathematical model we can determine two measures according to the expected cost: the average cost and the reproducibility probability. 1. Average Cost (AC) expressed as (9) 𝐸(𝑔(𝐲)) = ∑𝑦∈𝑌 𝑔(𝐲)𝑝(𝐲) where 𝑔(𝐲) = ∑𝐾 𝑖=1 𝑐𝑖 . 2. Reproducibility Probability (RP) (10) 𝑃(𝑔(𝐲) > 𝐶) = ∑𝑌:𝑔(𝐲)>𝐶 𝑝(𝐲) where C is a given level of the reproducibility cost. VI. CONLUSION In this paper we investigated the possible types of the scientific workflows from a reproducibility perspective. The basis of our analysis is the decay-parameter which describes the type and the measure of the change of the descriptor’s values. According to this parameter we determined a cost function which means the “work” required to reproduce the given job or workflow. In this way we could classify the scientific workflows, how they can be reproduced at a later time. In the different categories we set up methods to make the workflows reproducible or we gave the probability and the extra cost of the reproducibility. Finally we gave two general measure to evaluate the expected cost of the reproducibility. The goal of our research is to support the scientists with methods to make their experiment reproducible and to provide information about the possibility to reproduce their workflows. MIPRO 2016/DC VIS REFERENCES [1] D. Koop, E. Santos, P. Mates, T.Vo Huy, P Bonnet, B. Bauer, M.Troyer, D.N. Williams, J.E. Tohline, J. Freire, C.T. Silva, „A Provenance-Based Infrastructure to Support the Life Cycle of Executable Papers”, Internatioonal Conference on Computational Science, ICCS 2011. [Online]. Available: http://www.sciencedirect.com. [2] A. Banati, P. Kacsuk, M. Kozlovszky, M. Four level provenance support to achieve portable reproducibility of scientific workflows. In Information and Communication Technology, Electronics and Microelectronics (MIPRO), 2015 38th International Convention on (pp. 241-244). IEEE. [3] P. Missier, S. Woodman, H. Hiden, és P. Watson, „Provenance and data differencing for workflow reproducibility analysis”, Concurrency and Computation: Practice and Experience, 2013 [4] R. D. Peng, „Reproducible Research in Computational Science”, Science, köt. 334, sz. 6060, o. 1226–1227, dec. 2011 [5] J. P. Mesirov, „Accessible Reproducible Research”, Science, köt. 327, sz. 5964, o. 415–416, jan. 2010. [6] D. De Roure, K. Belhajjame, P. Missier, J. M. Gómez-Pérez, R. Palma, J. E. Ruiz, K. Hettne, M. Roos, G. Klyne, C. Goble, és others, „Towards the preservation of scientific workflows”, in Procs. of the 8th International Conference on Preservation of Digital Objects (iPRES 2011). ACM, 2011. [7] S. Woodman, H. Hiden, P. Watson, és P. Missier, „Achieving reproducibility by combining provenance with service and workflow versioning”, in Proceedings of the 6th workshop on Workflows in support of large-scale science, 2011, o. 127–136. [8] A. Davison, „Automated Capture of Experiment Context for Easier Reproducibility in Computational Research”, Computing in Science & Engineering, köt. 14, sz. 4, o. 48–56, júl. 2012. [9] J. Zhao, J. M. Gomez-Perez, K. Belhajjame, G. Klyne, E. GarciaCuesta, A. Garrido, K. Hettne, M. Roos, D. De Roure, és C. Goble, „Why workflows break—Understanding and combating decay in Taverna workflows”, in E-Science (e-Science), 2012 IEEE 8th International Conference on, 2012, o. 1–9. [10] K. M. Hettne, K. Wolstencroft, K. Belhajjame, C. A. Goble, E. Mina, H. Dharuri, D. De Roure, L. Verdes-Montenegro, J. Garrido, és M. Roos, „Best Practices for Workflow Design: How to Prevent Workflow Decay.”, in SWAT4LS, 2012 [11] P. Groth, E. Deelman, G. Juve, G. Mehta, és B. Berriman, „Pipeline-centric provenance model”, in Proceedings of the 4th MIPRO 2016/DC VIS [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] Workshop on Workflows in Support of Large-Scale Science, 2009, o. 4. J. Freire, D. Koop, F. S. Chirigati, és C. T. Silva, „Reproducibility Using VisTrails”, 2014. [Online]. Available: http://citeseerx.ist.psu.edu/viewdoc/download doi:10.1.1.369.9566 F. S. Chirigati, D. Shasha, és J. Freire, „ReproZip: Using Provenance to Support Computational Reproducibility.”, in TaPP, 2013. V. Korolev, A. Joshi, V. Korolev, M. A. Grasso, A. Joshi, M. A. Grasso, D. Dalvi, S. Das, V. Korolev, Y. Yesha, és others, „PROB: A tool for Tracking Provenance and Reproducibility of Big Data Experiments.”, Reproduce’14. HPCA 2014, köt. 11, o. 264–286, 2014. D. Koop, J. Freire, és C. T. Silva, „Enabling Reproducible Science with VisTrails”, arXiv preprint arXiv:1309.1784, 2013. O. Belhajjame, K. Corcho, D. Garijo, J. Zhao, P. Missier, D. R. Newman, R. Palma, S. Bechhofer, G. C. Esteban, J. M. GomezPerez, G. Klyne, K. Page, M. Roos, J. E. Ruiz, S. Soiland-Reyes, L. Verdes-Montenegro, D. De Roure, and C. Goble. Workflow-centric research objects: First class citizens in scholarly discourse. In Proceedings of the ESWC2012 Workshop on the Future of Scholarly Communication in the Semantic Web, 2012 Bechhofer S., De Roure D., Gamble M., Goble C., Buchan I.: Research objects: Towards exchange and reuse of digital knowledge. In: he Future of the Web for Collaborative Science, 2010. Belhajjame K., Zhao J., Garijo D., Gamble M., Hettne K., Palma R., Goble C.: Using a suite of ontologies for preserving work ow-centric research objects. In: Web Semantics: Science, Services and Agents on the World Wide Web, 2015. Piccolo S.R., Lee A.B., Frampton M.B.: Tools and techniques for computational reproducibility. In: bioRxiv, vol. 022707, 2015. Santana-Perez I., Prez-Hernndez M.S.: Towards Reproducibility in Scientific Workflows: An Infrastructure-Based Approach. In: Scientific Programming, vol. 2015, p. 11, 2015. Banati A., Kacsuk P., Kozlovszky M.: Minimal sufficient information about the scientific workflows to create reproducible experiment. In: IEEE 19th International Conference on Intelligent Engineering systems (INES), Slovakia, 2015. Banati A., Kacsuk P., Kozlovszky M.: Reproducibility analysis of scientific workflows; Acta Politechnica Hungarica, unpublished 347 Dynamic Execution of Scientific Workflows in Cloud E. Kail , J. Kovács , M. Kozlovszky1,2 and P. Kacsuk 1 2 2,3 Óbuda University, John von Neumann Faculty of Informatics, Biotech Lab Bécsi str. 96/b., H-1034, Budapest, Hungary 2 MTA SZTAKI, LPDS, Kende str. 13-17, H-1111, Budapest, Hungary 3 University of Westminster, 115 New Cavendish Street, London W1W 6UW {kail.eszter, kozlovszky.miklos}@nik.uni-obuda.hu, {jozsef.kovacs, kacsuk}@sztaki.mta.hu 1 Abstract - Scientific workflows have emerged in the past decade as a new solution for representing complex scientific experiments. Generally, they are data and compute intensive applications and may need high performance computing infrastructures (clusters, grids and cloud) to be executed. Recently, cloud services have gained widespread availability and popularity since their rapid elasticity and resource pooling, which is well suited to the nature of scientific applications that may experience variable demand and eventually spikes in resource. In this paper we investigate dynamic execution capabilities, focused on fault tolerance behavior in the Occopus framework which was developed by SZTAKI and was targeted to provide automatic features for configuring and orchestrating distributed applications (so called virtual infrastructures) on single or multi cloud systems. I. INTRODUCTION Over the last few years, cloud computing has emerged as a new model of distributed computing by offering hardware and software resources as virtualization-enabled services. Cloud providers give application owners the option to deploy their application over a network with a virtually infinite resource pool with modest operating and practically no investment costs. Today, cloud computing systems follow a service-driven, layered software architecture model, with Software as a Service (SaaS), Platform as a Service (PaaS), and Infrastructure as a Service (IaaS). In this paper we are primarily focusing on IaaS cloud services. In an IaaS environment the CPU, storage, and network resources are supplied by a collection of data centers installed with hundreds to thousands of physical resources such as cloud servers, storage repositories, and network backbone. It is the task of the cloud orchestrator to select the appropriate resource for an initiated application or service executed in the cloud. Due to their rapid elasticity and almost infinite resource pooling capabilities cloud services have also gained widespread popularity for enacting scientific experiments. Scientific experiments are widely used in most scientific domains such as bioinformatics, earthquake science, astronomy, etc. In general they consist 348 of multiple computing tasks that can be executed on distributed and parallel infrastructures. Scientific workflows are used to model these scientific experiments at a high level abstraction. They are graphically represented by Directed Acyclic Graphs (DAGs), where the nodes are the computing tasks and the edges between them represent the data or control flow. Since these experiments mostly require compute and data intensive tasks the execution of scientific workflows may last for even weeks or months, and may manipulate even terabytes of data. Thus scientific workflows should be executed in a dynamic manner in order to save energy and time. Dynamic execution has three main aspects: fault tolerance, intervention and optimization techniques. Fault tolerance means to continue the execution with the required SLA even in the presence of failures, or to adapt to new situations and actual needs during runtime. Since scientific workflows are mainly explorative by nature scientists often need to monitor the execution, to get feedback about the status of the execution and to interfere with it. Intervention by the scientist, workflow developer or the administrator may also be needed in a planned or in an ad-hoc manner. The third aspect of dynamic execution concerns with optimization mechanisms, such as performance, budget, time or power optimization techniques. In this paper we are investigating the possibilities of executing scientific workflows dynamically in Occopus, we examine the required extensions to provide a reliable service for workflow orchestration and propose a fault tolerant mechanism which is based on the workflow structure and replication technique. Occopus [7] is a newly introduced framework, developed by the Hungarian SZTAKI and was targeted to provide automatic features for configuring and orchestrating distributed applications on single or multi cloud systems. Our paper is structured as follows: in the next section we give a brief overview about the related work on communication middleware used in distributed systems and on fault tolerance in cloud. In section III we introduce Occopus framework in a more detailed fashion. In section MIPRO 2016/DC VIS IV we analyze the possibility of executing workflows in Occopus and section V introduces our solution in detail. Finally in the last section we give a brief insight into our fault tolerant proposal and the conclusion closes our work. II. RELATED WORK A. Communication middleware in the cloud As distributed applications transcending geographical and organizational boundaries the demands placed upon their communication infrastructures will increase exponentially. Modern systems operate in complex environments with multiple programming languages, hardware platforms, operating systems and the requirement for dynamic deployments, and reliability while maintaining a high Quality-of-Service (QoS). Cloud orchestration in general means to build up and manage interconnections and interactions between distributed services on single or multi cloud systems. At first the orchestrator allocates the most appropriate resource for a job from a resource pool, than monitors the functioning of the resource with a so called heartbeat mechanism. However, this mechanism only gives feedback about the physical status of the resources (CPU, memory usage, etc.) it cannot provide reliability in communication and data sharing. Concerning scientific workflows the main challenge is to provide high availability, reliable communication, fault tolerance and SLA based service. In most scientific workflow management system a special middleware is responsible to maintain the connection, and data movement between the distributed services and to schedule the tasks according to available resources and predefined constraints (data flow model). To provide a reliable, flexible and scalable communication between the services a suitable communication middleware is needed. Communication middlewares can be categorized as Remote Procedure Call (RPC) oriented Middleware, Transaction-Oriented Middleware (TOM), ObjectOriented/Component middleware (OOCM) and MessageOriented Middleware (MOM) [6]. RPC oriented Middleware is based on a client-server architecture and provides remote procedure calls through APIs. This kind of communication is synchronous to the user, since it waits until the server returns a response thus it does not enable a scalable and fault tolerant solution for workflows [5]. A Transaction-Oriented Middleware (TOM) is used to ensure the correctness of transaction operations in a distributed environment. It is primarily used in architectures built around database applications [14]. TOM supports synchronous and asynchronous communication among heterogeneous hosts, but due to its redundancies and control information attached to the pure data for ensuring high reliability, it results in low scalability in both the data volume that can be handled, and in the number of interacting actors. An Object-Oriented/Component Middleware (OOCM) is based on object-oriented programming models and MIPRO 2016/DC VIS supports distributed object request. OOCM is an extension of Remote Procedure Calls (RPC), and it adds several features that emerge from object-oriented programming languages, such as object references, inheritance and exceptions. These added features make OOCM flexible, however this solution enables still limited scalability. A Message-Oriented Middleware (MOM) allows message passing across applications on distributed systems. A MOM provides several features such as:  asynchronous and synchronous communication mechanisms  data format transformation (i.e. a MOM can change the format of the data contained in the messages to fit the receiving application [16])  loose coupling among applications  parallel processing of messages  support for several levels of priority. Message passing is the ancestor of the distributed interactions and one of the realizations of MOM. The producer and the consumer are communicating via sending messages. The producer and the consumer are coupled both in time and space; they must both be active at the same time. The consumer receives messages by listening synchronously on a channel and the recipient of a message is known to the sender. Message queues are newer solutions for MOM, where messages are concurrently pulled by consumers, as well as a subscription based exchange solution, allowing groups of consumers to subscribe to groups of publishers, resulting in a communication network or platform, or a message bus. Message queues provide an asynchronous communication protocol. Its widespread popularity lies in not only its asynchronous feature but in the fact that it provides persistence, reliability and scalability enabling both time and space decoupling of the so called publishers and consumers. Advanced Message Queuing Protocol (AMQP) [1] is an open standard application layer protocol for messageoriented middleware. RabbitMQ [3] is an open source message broker software (sometimes called messageoriented middleware) that implements the AMQP and can be easily used on almost all major operating systems. B. Fault tolerance in cloud Although cloud computing has been widely adopted by the industry, still there are many research issues to be fully addressed like fault tolerance, workflow scheduling, workflow management, security, etc [8]. Fault tolerance is one of the key issues amongst all. It is a complex challenge to deliver the quality, robustness, and reliability in the cloud that is needed for widespread acceptance as tools for the scientists’ community. To deal with this problem many research has been already done in fault tolerance. Fault tolerance policy can be proactive and reactive. While the aim of proactive 349 III. Info Broker Info Provider Infraand Node descripti ons Infrastructure Processor Enactor OCCOPUS ARCHITECTURE Occopus [7] (Fig. 1) has five main components: enactor issues virtual machine management requests towards the infrastructure processor; infrastructure processor, which is the internal representation of a virtual infrastructure (enabling the grouping of VMs serving a common aim); cloud handler enables federated and interoperable cloud use by abstracting basic IaaS functionalities like VM creation service composer, which ensures that VMs meet their expected functionalities by utilizing configuration management tools and finally the information broker that decouples the information producer and consumer roles with a unified interface throughout the architecture. After receiving the required infrastructure description the enactor immediately compiles it to an internal representation. It is the role of the enactor to forward and upgrade the node requests to the infrastructure processor and to monitor the state of the infrastructure continuously during the setup and the existence of the infrastructure. This monitoring function is achieved by the help of the info broker. Among others this component is responsible for tracking the information flow between the nodes. If it notices a failure of a node or a connection it notifies the enactor. The enactor then upgrades the infrastructure description and forwards it to the infrastructure processor. The infrastructure processor receives node creation and node destruction requests from the enactor. During creation infrastructure processor sends a contextualized VM requests to the cloud handler. Within the contextualization information the processor places a reference to some of the previously created attributes of VMs.Node destruction requests are directly forwarded to the cloud handler component. The cloud handler as its basic functionality, provides an abstraction over IaaS functionalities and allow the creation, monitoring and destruction of virtual machines. For these functionalities, it offers a plugin architecture that can be implemented with several IaaS interfaces (currently Occopus supports EC2, nova, cloudbroker, docker and OCCI interfaces). 350 The main functionality of the Service Composer is the management of deployed software and its configuration on the node level. This functionality is well developed and even commercial tools are available to the public. Occopus Service Composer component therefore offers interfaces to these widely available tools (e.g., Chef, Puppet, Docker, Ansible). Compiler techniques is to avoid situations caused by failures by predicting them and taking the necessary actions, reactive fault tolerance policies reduce the effect of failures on application execution when the failure effectively occurs. Different fault tolerance challenges and techniques (resubmission, checkpointing, self-healing, job migration, preemptive migration) have been implemented using various tools (HAProxy, Hadoop, SGuard,) in the cloud. Also there are a lot of methods created for providing fault tolerant execution of scientific workflows in the cloud. Mostly they heavily rely on sophisticated and complex models of the failure behavior specific to the targeted computing environment. In our investigations we are targeting a solution that is mostly based on the workflow structure and data about the actual execution timings retrieved from provenance database. Node resolver [cloudinit, docker, cloudbroker] Info Provider Info Provider Service Composer [chef] Cloud Handler [boto, nova, docker, cloudbroker, occi] Figure 1. Occopus architecture IV. SCIENTIFIC WORKFLOWS IN OCCOPUS A. Scientific workflows In Occopus framework a virtual infrastructure can be built upon a directed acyclic graph representing some complex scientific experiment which consists of numerous computational steps. The connection between these computational steps represent the data dependency, in other words the dataflow during the experiment. With Occopus the infrastructure descriptor would contain the needed resource requirements for each task and also the SLA for the tasks or for the whole workflow. An SLA requirement could be time or budget constraint or a need for green execution. In such a scenario the execution could be seen as data flowing across the VMs starting from the entry task executed on the first VM, terminating by the exit task executed on the last VM. In a scenario like this every computational step is mapped to an individual VM. After submitting the virtual infrastructure descriptor based on the workflow model Occopus would support the creation of an infrastructure like this. This type of workflow creation and execution is called Service Choreography in related works. 1) Advantages Concerning the execution of scientific workflows in Occopus has several advantages: The resources are available continuously, it means that task execution are not forced to wait for free resource. The infrastructure is built up easily without expertise knowledge of the individual cloud providers. The monitoring is also provided by the Occopus framework. There is no need for scheduling and resource allocation is done by Occopus based on the virtual infrastructure descriptors. MIPRO 2016/DC VIS 2) Problems with scientific workflows executed in Occopus Concerning scientific workflows there might arise a lot of issues: Scientific experiments are being data and compute intensive which may last for even weeks or month and may use or produce even terabytes of data. Due to the long execution time many failure could arise. Types of faults that can arise during execution and need to be handled in order to provide a fault tolerant execution can mainly be categorized to the following categories: VM faults, programming faults, network issues, authentication problems, file staging errors, and data movement issues. In order not to lose the already calculated work, fault tolerance must be provided. • When a node fails, the computation which was done by this node is lost. As described in the previous section Occopus monitors the nodes of the virtual infrastructure and when the enactor notices a failed node, it is deleted from the virtual infrastructure list and a new one is created. The execution could be restarted on it. But what should happen with the data that was consumed by this failed node? • Let us focus on only one aspect of SLA-s (Service Level Agreement), namely the time constraints. Scientific workflows are often constrained by soft or hard deadlines. While soft deadline means that the proposed deadline should be met with a probability p, hard deadline means that the results are useless after the deadline. When there is a failure upon recovery the makespan of the whole workflow is increased and maybe deadlines cannot be met. How can it be ensured that SLAs are met? • Fault tolerance technique should also be concerned when executing scientific workflows in the cloud. The most frequent fault tolerant techniques in the cloud are using resubmission and replicas. How can it be ensured that more than one successors of the same type (replica) is able to receive the results of the predecessor(s) and how can it be provided that the number of replicas can change dynamically in time? In the next section we are looking for the best solution that address the issues described above. V. SOLUTIONS There are two main widespread used alternatives that can give solutions for the above mentioned problems and are supported by open source softwares. One of them is based on service discovery feature, while the other uses a message queueing system. A. Service Registry Service discovery is a key component of most distributed systems and service oriented architectures deploying more services. Service locations can change quite frequently due to host failure or replacement, similarly to a scientific workflow execution. A node must discover somehow the IP address and the port number of the peer MIPRO 2016/DC VIS application. One solution is to use a dedicated, centralized service registry node. A service registry, is a database of services, their instances and their locations. The main task of it is to register hosts, ports and authentication credentials, etc. Service instances are registered with the service registry on startup and deregistered on shutdown. Clients of the service query the service registry to find the available instances of a service. If a node fails the service registry database is upgraded with the new node. Concerning workflow execution if a node fails then the service registry updates its database with the new client, but the computation that was already done is lost. Also the consumed data is lost with the failed node. If a computation has successfully terminated on a VM then the results of this computation task can be (should be) stored in provenance database, but because of the nature and size of the provenance database, it must be located on a permanent storage. To retrieve these data from this storage may have high latency due to geographic location which can be far from the cloud provider. The flexibility of the solution is also not so good. In this case the nodes know where to send data so they use the synchronous remote procedure call or the asynchronous message passing middleware. As it was mentioned already in the related work the RPC model does not support large volume of data and does not support reliable transport of data. Also with message passing middleware the communication abstract is the channel and a connection must be set up between producer and consumer and consumer listens for the channel synchronously. Using this solution a special agent would be needed to orchestrate the execution of the workflow itself. Without this agent this solution would work only for small workflows that use does not move high volume of data does not need long time to be executed and the reliability of the resources are high. B. Message Queuing Using message queues would simplify almost all of the above mentioned problems. Advanced Message Queuing (AMQ) Protocol is an open standard message oriented middleware. In this approach the message producer does not send the message directly to a specific consumer instead characterize messages into classes without knowledge of which consumer there may be. Similarly, consumers only receive messages that are of interest, without knowledge of existing producers. AMQP operates over an underlying reliable transport layer protocol such as Transmission Control Protocol (TCP). The basic idea is that consumers and producers use a special node to accomplish message passing, which serves as a rendezvous point between senders and receivers of messages. They are the queues which are buffers that temporary or permanently store the messages. The middleware server has two main functionalities: one of them is buffering the messages in memory or on disk when the recipient cannot accept it fast enough and the other one is to route the messages to the appropriate 351 queue. When a message arrives in a queue the server attempts to deliver it to the consumer immediately. If this is not possible the message is stored until the consumer is ready. If it is possible the message is deleted from the buffer immediately or after the consumer has acknowledged it. The reliability lies in this feature. The acknowledgments could be sent only when the node had successfully processed the data. The scalability and fault tolerance can be realized by clustering the same type of nodes. In this solution the consumers and producers are not known to each other and there can be more than one consumer belonging to a single queue. The power of AMQP comes from the ability to create queues, to route messages to queues and even to create routing rules dynamically at runtime based on the actual environmental conditions. This feature would enable to realize an SLA based, fault tolerant execution for workflows. replicas in order to ensure that time critical workflows can be successfully terminated before the soft or hard deadline with a probability of p. In our solution every task in a workflow is assigned certain number of replicas. The number of replicas is determined by the estimated execution time of a task, the structure of the workflow, the failure zone of a task (which is the affected tasks in the case of a failure of a given task), and the estimated failure detection time and resubmission time. Before Occopus starts to build the infrastructure the algorithm is executed to determine the number of replicas of each task and the infrastructure is built. When during execution unexpected situation happens (for example too many failures occurring, or even that there is no error) than the number of replicas can be changed accordingly since Occopus is able to upgrade the infrastructure. Determining the exact details of our algorithm is our future work. VII. CONLUSION In this paper we have introduced Occopus, a one click cloud orchestrator framework, which supports distributed applications to be executed in single or multi homed clouds. We have investigated the advantages and problems of having scientific workflows being executed with Occopus and gave a proposal for a communication middleware which would provide a reliable, fault tolerant workflow execution environment. We gave a first insight for a fault tolerant method that can be used with Occopus as well and which detailed work out determines our future research direction. Figure 2. Possible architecture with Message Queueing In Fig. 2 a possible architecture for executing scientific workflows with Occopus can be seen. The abstract model of the scientific workflow consists of 3 tasks (A1, A2 and A3) for each an individual VM is started in the cloud. These are provided by the Occopus framework. Also Occopus does the monitoring of the resources, as well as the infrastructure upgrading. All of the tasks are communicating with the MQ (message queue), so they are not aware of each other. The MQ can be positioned also in the cloud on a VM or on external storage. It depends on the amount of data that must be shared between the tasks and the geographic location of the VMs. The Agent is only responsible for monitoring the workflow execution according to the predefined constraints (the input, output format of the data, time constraints, etc.) and according to the SLA to request a virtual infrastructure change from the Occopus framework (for example to start more or less replicas of the tasks). VI. REFERENCES [1] [2] [3] [4] [5] [6] [7] [8] AMQP Advanced Message Queuing Protocol, Protocol Specification, Version 0-9-1, 13 November 2008. E. Curry, Message-Oriented Middleware, in: Q.H. Mahmoud (Ed.), Middleware for Communications, John Wiley and Sons, Chichester, England, 2004, pp. 1–28. A. Videla, J.W. Williams, RabbitMQ in Action: Distributed Messaging for Everyone, MEAP Edition Manning Early Access Program, 2011. P.T. Eugster, P. Felber, R. Guerraoui, A.-M. Kermarrec, “The many faces of publish/ subscribe”, ACM Comput. Surv. 35 (2) (2003) 114–131. K. Geihs, Middleware challenges ahead, IEEE Comput. 34 (6) (June 2001) 24–31. M. Albano et al. Message-oriented middleware for smart grids, Computer Standards & Interfaces 38 (2015) 133–143. G. Kecskeméti, M. Gergely, A. Visegrádi, Zs. Németh, J. Kovács, P. Kacsuk, One Click Cloud Orchestrator: bringing Complex Applications Effortlessly to the Clouds, WORKS 2014. A. Bala, I. Chana, Fault Tolerance- Challenges, Techniques and Implementation in Cloud Computing, IJCSI International Journal of Computer Science Issues, Vol. 9, Issue 1, No 1, January 2012 FAULT TOLERANCE METHOD BASED ON WORKFLOW STRUCTURE Concerning time critical applications a reliable fault tolerant method should be provided. In this section we lay down the bases of our fault tolerant framework that uses 352 MIPRO 2016/DC VIS FPGA Kernels for Classification Rule Induction P. Škoda* and B. Medved Rogina* * Ruđer Bošković Institute, Zagreb, Croatia pskoda@irb.hr Abstract - Classification is one of the core tasks in machine learning data mining. One of several models of classification are classification rules, which use a set of if-then rules to describe a classification model. In this paper we present a set of FPGA-based compute kernels for accelerating classification rule induction. The kernels can be combined to perform specific procedures in rule induction process, such as evaluating rule coverage, or estimating out-of-bag-error. Since classification problems are getting increasingly larger, there is a need for faster implementations of classification rule induction. One of the platforms that offer great potential for accelerating data mining tasks is FPGA (field programmable gate array), which provides the means for implementing application specific accelerators. Key words - FPGA, dataflow, machine learning, classification rules I. INTRODUCTION Classification is one of the fundamental tasks in machine learning, and is used in a wide variety of application. One of the fundamental models of classification is classification rules [1]. Classification rules express the classification model as a set of IF-THEN rules, and enable implementation of fast classifiers with high throughput. Classification rule induction is usually performed in one of two ways: (a) by constructing a decision tree and then extracting rules form it; or (b) by using a covering algorithm. Applications of classification, as well as the field of data mining in general, are faced with continuing increase in dataset sizes. This drives efforts to develop new faster and more efficient algorithms, as well as developing new hardware platforms that provide high computational power [2]. One hardware platforms that is gaining more ground in computing is the field programmable gate array (FPGA). FPGA. FPGAs are digital integrated circuits designed to be user-configured after manufacturing [3], [4]. They enable fast development and deployment of custom digital hardware, and thus allow implementation of custom computational architectures that can be reconfigured on demand. The most suitable computational mode for FPGA is the dataflow model. Most computer systems implement a control-flow model, in which the computation is described by a sequence of operations that are executed in specified order. In dataflow model, computation is described by chain of transformations applied to a stream of data [5]. This work was supported in part by Maxeler University Programme, and by Croatian Science Foundation under the project number I-1701-2014. MIPRO 2016/DC VIS The dataflow is described in form of a graph. Nodes of the graph represent operations, and edges represent data links between the nodes. The dataflow graph maps naturally to the hardware implementation. Nodes/operations are translated to functional hardware blocks, while edges are translated to data buses and signals. By dividing the graph into stages separated by registers, it’s transformed into a pipelined structure suitable for implementation on FPGA. In this paper we present a novel set of kernels for accelerating classification rule induction that uses a variant of covering algorithm. The kernels are targeted for dataflow architecture realized on FPGA. II. RELATED WORK While there has been a lot of activity in implementing classification rules on FPGAs, mostly for network filtering applications, there is no published work known to authors related to rule learning. There is, however, some work in the closely related problem of decision tree induction [6]–[8]. Narayanan et al. [7] have used FPGA to implement decision tree induction system for binary classification. They implemented Gini impurity computation, as the most intensive part of the process. The rest of the algorithm is executed on a PowerPC CPU embedded in the FPGA device. Chrysos et al. [8] implement frequency table computation on FPGA as part of HC-CART system for learning decision trees. Frequency table computation is implemented in Frequency Counting module, which receives attribute-class label pairs. The pairs are used to generate addresses for frequency table locations which contents are then incremented by one. In their implementation, all attribute-class label pairs received by the kernel have to be prepared by program running on CPU, and then preloaded to memory attached to FPGA. Input data is transferred from CPU to FPGA memory many times in the course of program execution. In our previous work [9], we implemented a 2D frequency table computation using FPGAs. The kernel receives two input streams – one for address and one for class labels – and counts occurrences of pairs in FPGA’s internal memory. The dataset is held in FPGA attached SDRAM. We built upon this work by expanding it to multiple attribute streams [10], which allow computation of several tables simultaneously, and by implementing efficient data transfer for the computed tables. In this paper we present out work 353 TABLE I. BASIC INFORMATION ON MAXELER VECTIS-LITE FPGA BOARD FPGA Off-chip RAM (LMem) On-chip RAM (FMem) CPU ↔ FPGA bandwidth LMem ↔ FPGA bandwidth III. Xilinx Virtex-6 XCVSX475T 6 GiB (6×1 GiB) DDR3-800 SDRAM ~ 4 MiB Block RAM 2 GB/s 38.4 GB/s max. DATAFLOW ENGINE ARCHITECTURE The accelerator is implemented in form of a dataflow engine (DFE) which consists of at least one kernel and a manager. The computation is implemented by kernels, while the role of manager is to organize data movement within the DFE. Kernels are instantiated within the manager, and data links between the kernels, external RAM, and CPU are defined. The frequency table computation was implemented using the Maxeler platform [11]. The platform includes an FPGA board, drivers, and API. The DFE is coded in Java, using the API which provides objects and methods that are translated to hardware units by the compiler. The compiler translates the Java code into VHDL, and generates software interface for CPU code [12]. The DFE was implemented on Maxeler Vectis-Lite board, which contains a single FPGA, and on-board SDRAM. The board is connected to the host PC workstation via PCIe bus. Basic information on the board is shown in Table I. A. Count conditional kernel Architecture of the ComputeFreq-MS kernel is shown in Fig. 1. The kernel has two scalar inputs for parameters: one for number of items to process (items), and one for stream length in items (strmLen). Stream length sets the number of items to read from the on-board SDRAM, which requires all accesses to be made in 96 byte blocks, i.e. 24 item blocks (4 bytes per item). The kernel processes the first items elements from the stream, while the rest are read from memory and ignored. The kernel consists of six identical frequency counter structures. Each frequency counter receives two input streams – one for attribute and one for class values. Both streams carry 32 bit unsigned integers. The input streams are sliced so the low NA bits retained from the attribute, and NC bits are retained from the class values. The retained bits are concatenated to form the frequency table address. In the table address, the attribute value is the high, and the class value is the low part of the address word. The kernel has a total of seven stream inputs, six for attribute (att0 – att5), and one for class (class) value streams. Each attribute value stream is connected to a single frequency counter. The class value stream fans out to all frequency counters, so that they all receive identical class value streams. B. Increment conditional kernel The frequency tables are stored in on-chip memory – block RAM (BRAM). Each frequency counter has one BRAM which is addressed by the address formed from the low bits of attribute and class streams. The addressed location is read, its value incremented by one, and written back to the same location. The BRAM is configured as single port RAM with read-first synchronization. Due to access latencies of the BRAM itself, and the registers inserted by the compiler, there are latencies inside the loop. To ensure correct counting, the kernel’s input streams are throttled to allow input of one element each m clock cycles, where m is the internal latency of the loop. After all elements of interest are streamed in, the contents of the BRAMs are streamed out to the host main memory through stream output s. The output stream s is made by joining output streams from all frequency counters (s0 – s5) into a one vector stream. The output vector is padded to 8 elements, i.e. two additional dummy elements are added to it. The padding enables efficient conversion of the output stream to word width of 128 bits which is used in communication over PCIe bus. As BRAMs are read, they are at the same time reset to zero. During the read phase, the BRAMs are addressed by a counter, and their write ports are switched from increment-by-one values to constant zero. In this case there are no loop dependencies, and the output stream is not throttled – it outputs one element every clock cycle. At the same time, any remaining elements in the input stream are read un-throttled (one per clock) and ignored. C. Logic simple kernel A single ComputeFreq-MS kernel is instantiated in manager. Input streams attClass and att0 – att5, are linked TABLE II. Figure 1. ComputeFreq-MS kernel architecture for up to 64 unique attribute and class values (NA = 6, NC = 6). 354 FPGA RESOURCE USAGE BY THE DFE Resource Used Total available Utilization LUTs Flip-flops DSP blocks Block RAM 34,724 50,064 0 226 297,600 297,600 2,016 2,128 11.67 % 16.82 % 0.00 % 10.62 % MIPRO 2016/DC VIS to on-board SDRAM. All input streams use linear memory access pattern. Output stream s is linked to computer main memory. Source addresses for input streams att0 – att5, attClass, values for scalar inputs items and strmLen, and destination address for output stream s are defined through the generated DFE interface. The kernel parameters were set for up to 64 unique attribute and class values (NA = 64, NC = 64). Each frequency table holds 4096 32-bit words. Due to the frequency counter’s loop latency, it can process one item (attribute - class pair) every five clock cycles. Kernel clock frequency is set to 300 MHz. With six attribute streams, the DFE can process up to 360×106 elements per second. FPGA resource usage for the DFE is given in Table II. IV. EXPERIMENTAL RESULTS A. Test environment The kernel was benchmarked using code form C4.5 Release 8 decision tree learning program [13]. Parts needed to load the datasets, and computing the frequency tables were extracted from the program and used for benchmarking. The frequency table computation was parallelized using OpenMP [14] by distributing attributes of the dataset over the threads. This included replicating the frequency table data structure to accommodate multithreaded execution. The original ComputeFrequencies function was modified to use the replicated data structure. Since the attributes were distributed over threads, multiple invocations of ComputeFrequencies function were necessary if number of attributes exceeded the number of threads. Functions for transforming the dataset to the appropriate format, and loading it to the DFE were added to the benchmark program. For execution on DFE, a new function was created by modifying the original ComputeFrequencies function. Modifications involve removing the computation loop and replacing it with function calls to the DFE. Received results are transformed into the frequency table data structure used by the parallelized software implementation. Since the DFE processes six attributes in parallel, the ComputeFrequencies function is invoked once for every Figure 2. Execution time on CPU as function of number of items, measured for datasets with 6 to 1,536 attributes MIPRO 2016/DC VIS group of six attributes. Time measurement was added to the ComputeFrequencies function. Entire run time of the function is measured. The function’s throughput is calculated from the measured times using the formula: Ta ,i = ai ta ,i (1) where a is number of attributes, i is number of items, ta,i is function execution time, and Ta,i is the calculated function throughput for the dataset with a attributes, and i items. A set of 102 synthetic datasets were used in the benchmark. Number of attributes ranged from 6 to 1,536, in a geometric sequence 6×2i, with i = 0, 2... 7. Number of items ranged from 2048 to 4×220, in a geometric sequence 2i, with i = 11, 12... 22. The dataset size was limited to 3 GiB, i.e. to 768×220 elements. Datasets were generated randomly, with uniformly distributed values in range of 1-63 for attribute values, and 0-64 for class values. Benchmark program was compiled using gcc compiler, version 4.4.7. Benchmarks were run on Intel Xeon E5-1600 workstation, with 16 GiB DDR3-1600 RAM, under Centos 6.5 Linux OS. B. CPU benchmark results For the baseline CPU performance, the measurements were conducted on a six-threaded parallelized implementation of ComputeFrequencies function. Measured execution times are given in Table III, and are shown in Fig. 2. For datasets with over 128×210 items, execution time scales approximately linearly with number of items. There is a larger increase in execution in area of 16×210 – 32×210 items, for datasets with 96 or more attributes, and 32×210 – 256×210 for datasets with less than 96 attributes. This increase in execution is more obviously visible in function throughput, shown in Fig. 3, as a sharp drop of the throughput. From the figure it’s visible that for datasets with fewer items, the throughput increases with increasing number of items. Once the dataset exceeds a certain number of items (dependent on number of Figure 3. Throughput on CPU as function of number of items, measured for datasets with 6 to 1,536 attributes 355 Figure 5. Execution time on DFE as function of number of items, measured for datasets with 6 to 1,536 attributes Figure 4. Throughput on DFE as function of number of items, measured for datasets with 6 to 1,536 attributes attributes), throughput drops, and quickly stabilizes to an approximately constant value. As number of attributes increases, function throughput decreases, assuming equal number of items. The number of items at which peak throughput is recorded, also decreases with increasing number of attributes, as well as the peak throughput value itself. For datasets with 384 or more attributes, there is no increase of throughput. Throughput just drops from one approximately constant value to another. in Fig 4. Execution time on DFE scales approximately linearly with number of items in the dataset, on datasets with 512×210 or more items. On datasets with fewer items, the execution times asymptotically approach a certain minimal value, dependent on number of attributes. For a fixed number of items, the execution times scale linearly with number of attributes. Maximum measured constant (invariant to number of items) throughput is 617×106 elements/s, achieved on dataset with 12 attributes. Minimal measured constant throughput is 52.2×106 elements/s, measured on dataset with 1,536 attributes. The drop in function throughput is most likely a consequence of interaction of CPUs cache system and main SDRAM. Smaller datasets have a smaller likelihood of cache miss, which translates into higher throughput for smaller datasets. Another important factor is the data structure for storing ht dataset. The dataset is stored in attribute major order, which leads to memory accesses with stride. With more attributes, the stride is larger, and consequentially, so is the likelihood of cache miss. C. DFE benchmark results DFE was benchmarked under the same condition as the six-threaded software implementation. Execution times measured on DFE are given in Table IV, and shown The kernel throughput graph, shown in Fig 5, shows that curves for all number of attributes form a tight group. The curves for different number of attributed cannot be distinguished one from another. On datasets with 512×210 items, the throughput is approximately constant at 340×106 elements/s. The measured throughput is close to the theoretical maximum of 360×106 elements/s. The DFE performs better with datasets with larger number of items, close to theoretical maximum when number of items is 512×210 or more. This is a consequence of communication and control overheads between the DFE and the host computer. The overhead is independent from number of items, and its influence diminishes as the number of items increases. As can be seen from Table IV, execution times are approximately constant for datasets with 8,192 or less items. The minimum DFE execution time was calculated from these measurements, amounting to 801.1 μs. D. Comparison of the results DFE and CPU results were compared by computing speedup, i.e. ratio of execution times on CPU and on DFE. The speedup is shown in Fig 6. For most of the test dataset sizes, execution on DFE is slower than on CPU. For datasets with fewer than 48 attributes, the DFE is slower no matter what the dataset size is. For 48 attributes the speedup exceeds unity value for numbers of items 128×210 or more. The maximum speedup is achieved on datasets with 384 and more attributes. Speedups on these datasets are close in value, suggesting that by further increasing number of attributes will result in negligible increase in speedup. These values can then be used as approximation of the upper bound on speedup. The maximum speedup measured is 6.26×, achieved on dataset with 1,536 attributes and 512×210 items. Figure 6. Speedup, measured for datasets with 6 to 1,536 attributes 356 MIPRO 2016/DC VIS TABLE III. Number of items 2,048 4,096 8,192 16,384 32,768 65,536 131,072 262,144 524,288 1,048,576 2,097,152 4,194,304 6 12 24 50.59 μs 63.31 μs 88.21 μs 139.2 μs 245.6 μs 462.1 μs 899.0 μs 2.056 ms 4.215 ms 8.365 ms 16.70 ms 33.49 ms 105.3 μs 133.5 μs 190.1 μs 302.8 μs 538.0 μs 1.014 ms 2.066 ms 5.184 ms 10.35 ms 20.48 ms 40.80 ms 81.27 ms 215.0 μs 280.4 μs 403.6 μs 668.8 μs 1.219 ms 2.296 ms 6.949 ms 14.48 ms 28.61 ms 56.80 ms 113.3 ms 226.3 ms MEASURED EXECUTION TIMES ON CPU Number of attributes / Execution time 48 96 192 434.9 μs 571.0 μs 847.8 μs 1.557 ms 2.867 ms 10.92 ms 25.08 ms 49.73 ms 98.80 ms 197.3 ms 394.1 ms 787.0 ms The total performance of the DFE was estimated by calculating the weighted average of speedups, weighted by number of elements in the dataset: S= ∑ ∑ ans ∑ ∑ an a a (1) a,n n mf Fa , n (1) where Ea,n is efficiency and Fa,n is throughput for dataset with a attributes and n items, f is clock frequency, and m is number of threads or number of streams on CPU and DFE respectively. Peak throughput values were used to calculate efficiency. The software implementation (CPU baseline) was executed on six threads, on CPU clocked at 3.2 GHz. Peak throughput was 875×106 elements/s. The DFE uses six TABLE IV. 6 12 24 800.6 μs 817.1 μs 898.9 μs 1.031 ms 1.322 ms 1.929 ms 3.009 ms 5.277 ms 9.664 ms 18.55 ms 36.03 ms 70.96 ms 1.597 ms 1.634 ms 1.780 ms 2.014 ms 2.626 ms 3.758 ms 6.153 ms 10.41 ms 19.35 ms 37.08 ms 72.09 ms 142.0 ms 3.220 ms 3.248 ms 3.544 ms 4.043 ms 5.338 ms 7.576 ms 11.98 ms 20.70 ms 38.62 ms 74.19 ms 144.2 ms 284.1 ms MIPRO 2016/DC VIS 384 768 1,536 6.049 ms 10.59 ms 19.56 ms 41.85 ms 156.3 ms 373.2 ms 706.2 ms 1.518 s 3.079 s 6.359 s 13.21 s – 8.887 ms 14.37 ms 37.46 ms 91.15 ms 360.6 ms 808.1 ms 1.702 s 3.467 s 6.987 s 14.01 s – – 21.32 ms 42.51 ms 89.34 ms 195.6 ms 803.1 ms 1.860 s 3.795 s 7.658 s 15.44 s – – – streams, and was clocked at 300 MHz. Peak throughput on DFE was 355×106 elements/s. Execution efficiency on CPU is 21.9 clock cycles per element, and on DFE it’s 5.08 clock cycles per element. Of the 5.08 clocks, 4 are consequence of the frequency counter’s internal loop latency. V. For further comparison, execution efficiency was computed for CPU and DFE. The efficiency is defined as: Number of items 2,048 4,096 8,192 16,384 32,768 65,536 131,072 262,144 524,288 1,048,576 2,097,152 4,194,304 1.769 ms 3.096 ms 5.027 ms 13.08 ms 46.18 ms 97.45 ms 191.5 ms 382.5 ms 766.7 ms 1.553 s 3.176 s 6.303 s n where S is the average speedup, and sa,n is the speedup on dataset with a attributes and n items. The average speedup is 4.1. E a,n = 916.9 μs 1.237 ms 2.228 ms 3.321 ms 13.99 ms 35.06 ms 62.94 ms 132.1 ms 267.1 ms 537.8 ms 1.071 s 2.139 s CONLUSION In this paper we presented a multi-streamed compute architecture for 2D frequency matrix computation, implemented on FPGA platform. This multi-streamed architecture is an advancement on our previous work [9]. Benchmark results reveal that CPU outperforms DFE for smaller datasets. This is a consequence of control and communication latencies between the FPGA board and the host computer. Minimal time required to execute any action on DFE is approximately 800 µs, regardless of dataset size. Small datasets are easily processed by the CPU in less time. The DFE outperforms CPU for larger datasets that have at least 48 attributes, 32×210 items, and 6×220 elements. Best speedup achieved by DFE is 6.26×. The DFE is more efficient in processing data, requiring only 5.08 clock cycles per dataset element, while CPU requires 21.9, when performing at peak efficiency. Of the DFE’s 5.08 clock cycles, 4 cycles are consequence of frequency counter’s internal loop latencies. MEASURED EXECUTION TIMES ON DFE Number of attributes / Execution time 48 96 192 6.385 ms 6.559 ms 7.133 ms 8.187 ms 10.55 ms 15.54 ms 24.43 ms 42.10 ms 77.18 ms 147.9 ms 288.3 ms 567.9 ms 12.86 ms 13.00 ms 14.39 ms 16.47 ms 21.03 ms 30.35 ms 48.17 ms 83.98 ms 154.0 ms 296.4 ms 576.1 ms 1.136 s 25.69 ms 25.90 ms 28.33 ms 33.04 ms 42.13 ms 60.91 ms 97.42 ms 169.3 ms 308.5 ms 594.5 ms 1.154 s 2.272 s 384 768 1,536 51.29 ms 51.89 ms 58.38 ms 65.85 ms 83.68 ms 123.2 ms 192.3 ms 338.4 ms 616.5 ms 1.184 s 2.307 s – 102.5 ms 104.2 ms 113.6 ms 131.9 ms 167.0 ms 243.9 ms 381.3 ms 675.5 ms 1.236 s 2.374 s – – 203.4 ms 209.2 ms 227.3 ms 260.4 ms 337.5 ms 483.4 ms 779.6 ms 1.356 s 2.467 s – – – 357 The kernel can be further improved by vectorizing inputs stream in the same manner its output was vectorized. This will allow efficient processing of more elements in parallel, and better utilization of available memory bandwidth. Another improvement would be reducing or eliminating the internal loop latencies. This can potentially quadruple its performance assuming that operating frequency of the kernel does not drop due to higher design complexity. In future work, in addition to the stated improvements, this kernel will be integrated in the C4.5 program as accelerator unit. However, to achieve significant performance gains the communication latency between host computer and FPGA will have to be compensated for. This will most likely involve adding support for batch processing of a series of small dataset, and require some modifications of the original algorithm. Overall, with additional improvements the kernel is expected to outperform the CPU for a wider range of dataset sizes. REFERENCES [1] [2] [3] [4] 358 J. Han and M. Kamber, Data Mining: Concepts and Techniques, 2nd ed. Morgan Kaufmann, 2006. A. N. Choudhary, D. Honbo, P. Kumar, B. Ozisikyilmaz, S. Misra, and G. Memik, “Accelerating data mining workloads: current approaches and future challenges in system architecture design,” Wiley Interdiscip. Rev. Data Min. Knowl. Discov., vol. 1, no. 1, pp. 41–54, Jan. 2011. I. Kuon, R. Tessier, and J. Rose, “FPGA Architecture,” Found. Trends Electron. Des. Autom., vol. 2, no. 2, pp. 153–253, 2008. Mark L. Chang, “Device Architecture,” in Reconfigurable Computing: The Theory and Practice of FPGA-based Computation, S. Hauck and A. DeHon, Eds. Morgan Kaufmann, [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] 2008, pp. 3–27. J. B. Dennis, “Data Flow Supercomputers,” Computer (Long. Beach. Calif)., vol. 13, no. 11, pp. 48–56, Nov. 1980. P. Škoda, B. Medved Rogina, and V. Sruk, “FPGA implementations of data mining algorithms,” in MIPRO, 2012 Proceedings of the 35th International Convention, 2012, pp. 362– 367. R. Narayanan, D. Honbo, G. Memik, A. Choudhary, and J. Zambreno, “An FPGA Implementation of Decision Tree Classification,” in 2007 Design, Automation & Test in Europe Conference & Exhibition, 2007, pp. 1–6. G. Chrysos, P. Dagritzikos, I. Papaefstathiou, and A. Dollas, “HCCART: A parallel system implementation of data mining classification and regression tree (CART) algorithm on a multiFPGA system,” ACM Trans. Archit. Code Optim., vol. 9, no. 4, pp. 1–25, Jan. 2013. P. Škoda, V. Sruk, and B. Medved Rogina, “Frequency Table Computation on Dataflow Architecture,” in MIPRO, 2014 Proceedings of the 37th International Convention, 2014, pp. 357– 361. P. Škoda, V. Sruk, and B. Medved Rogina, “Multi-stream 2D frequency table computation on dataflow architecture,” in MIPRO 2014, Proceedings of the 38h International Convention, 2015, pp. 288–293. “Maxeler Technologies,” 2015. [Online]. Available: http://www.maxeler.com/. [Accessed: 23-Dec-2015]. O. Pell and V. Averbukh, “Maximum Performance Computing with Dataflow Engines,” Comput. Sci. Eng., vol. 14, no. 4, pp. 98– 103, Jul. 2012. J. R. Quinlan, “C4.5 Release 8,” 1993. [Online]. Available: http://rulequest.com/Personal/c4.5r8.tar.gz. [Accessed: 23-Jun2013]. OpenMP Architecture Review Board, “OpenMP Application Program Interface,” 2011. [Online]. Available: http://www.openmp.org/mp-documents/OpenMP3.1.pdf. [Accessed: 05-Feb-2014]. MIPRO 2016/DC VIS VISUALIZATION SYSTEMS Prototyping of visualization designs of 3D vector fields using POVRay rendering engine J. Opiła* *AGH University of Science and Technology Department of Applied Computer Science, Faculty of Management, Cracow, Poland jmo@agh.edu.pl Abstract – There is a persistent quest for novel methods of visualization in order to get insight into complex phenomena in variety of scientific domains. Researchers, ex. VTK team, achieved excellent results; however, some problems connected with implementation of new techniques and quality of the final images still persist. Results of inspection of number of visualization styles of 3D vector field employing POVRay ray-tracing engine are discussed, i.e. hedgehogs, oriented glyphs, streamlines, isosurface component approach and texturing design. All styles presented have been tested using water molecule model and compared concerning computing time, informativeness and general appearance. It is shown in the work that Scene Description Language (SDL), domain specific language implemented in POVRay is flexible enough to use it as a tool for fast prototyping of novel and exploratory visualization techniques. Visualizations discussed in the paper were computed using selected components of API of ScPovPlot3D, i.e. templates written in the SDL language. Results are compared to designs already implemented in VTK. Keywords - POVRay, vector field ScPovPlot3D, visual data analysis, VTK. I. visualization, INTRODUCTION In the recent years both, computational and experimental methods deliver a lot of data. Most productive sciences includes astronomical sky surveys, engineering, econometrics, medical sciences including medical imaging and this condensed list is hardly complete. Often collected or computed data are in the form of 3D vector field, static or dynamic, on all spatial scales, for example ocean currents, wind speed or electrostatic molecular level fields. Vector fields can be described by differential equations which sometimes can be reduced to a simple formula e.g. for gravitational, or electrostatic field. However in numerous cases they can be obtained by measurements only for example winds distribution [1], [2]. Reliable analysis of such a data requires intensive usage of computer data processing and subsequent visualization step. After many years of development ([3], [4]) visualization of 3D vector fields still may be improved according to characteristic of a specific case, e.g. by The work was supported by AGH University of Science and Technology. MIPRO 2016/DC VIS implementation of novel hybrid designs. Thus visualization lib