Splinter Meeting 2016: E-Science & Virtual Observatory

Machineries of Discovery

Annual Meeting 2016 of the Astronomische Gesellschaft 2016, Bochum, Germany

Date

Wednesday,   14.09.2016, 14:45 - 16:15 und 17:00 - 18:30

Thursday,       15.09.2016, 14:45 - 16:15 und 17:00 - 18:30

Convenors

H. Enke, K. Polsterer, J.K. Wambsganss

Agenda   Call

Abstracts

Kai Posterer (HITS) Uncertain Photometric Redshifts

Photometric redshifts play an important role as a measure of distance for various cosmological topics. Instead of providing a point estimate only, astronomers start to generate probabilistic density functions (PDFs) which should provide a characterisati on of the uncertainties of the estimation. In this talk we present two simple approaches on how to generate those PDFs.

We use the example of generating the photometric redshift PDFs of quasars from SDSS(DR7) to validate our approaches and to compare them with point est imates. We do not aim for presenting a new best performing method, but we choose an intuitive approach that is based on well known machine learning algorithms. Furthermore we introduce proper tools for evaluating the performance of PDFs in the context of astronomy. The continuous ranked probability score (CRPS) and the probability integral transform (PIT) are well accepted in the weather forecasting community.

Gal Matijevic (AIP) Modeling Correlated Noise (in Stellar Spectra) with Gaussian Processes

Measuring a certain quantity in astrophysics usually means comparing the model of the underlying process to the collected data. This includes the atmospheric parameters than can be determined by minimizing the difference between the synthetic and the observed stellar spectrum. We can naively assume that the residuals between the two are normally distributed and so the the sigma-weighted difference between the squares of the flux values is a good cost function. Unfortunately, various additional sources contribute to the signal in the spectrum (such as scattered light, signal coming from the instrumentation, residuals from the reduction and processing etc.) which make the the difference between the observed and synthetic spectrum highly non-Gaussian. We will discuss how using Gaussian process can help in learning these variations from the data themselves and how adding this additional complication helps with achieving much less biased results, namely the stellar temperatures and especially elemental abundances. We will demonstrate this approach on the metal-poor sample of the RAVE spectroscopic survey.

Nicolai Bissantz (RUB)Uncertainty quantification in astronomical imaging and signal reconstruction

The question whether structural features in images or signals recovered from noisy data are of statistical significance, and therefore of scientific interest, or merely emerge from random noise is of great relevance in many practical applications such as image reconstruction or estimation of high-energy particle spectra.

In this talk we discuss several confidence interval and uniform confindence bands based methods for quantifying the uncertainty in recovered images and signals. The method can be adopted for use in a variety of data acquisition situations and is applicable to both images or signals and to the differences between images or signals e.g. taken at difference points in time. The latter yields a criterion to assess time-resolved structural changes.

The proposed method is based on data reconstruction with a regularization technique as well as new theoretical results on uniform confidence bands for the function of interest in a two-dimensional heteroscedastic nonparametric convolution-type inverse regression model. In particular, a new uniform limit theorem for the case of Poisson-distributed observations is used and a strong approximation result for a two-dimensional array of non-identically distributed Poisson-residuals by an array of independent and identically distributed Gaussian random variables is derived. Additionally, a statistical test for the detection of modes is used for an additional analysis of objects on being separate. Finally, a data-driven selection method for the regularization parameter based on statistical multiscale methods is discussed. The method can be used for a automatic data-driven data analysis.

Nikos Gianniotis (HITS)A Neural Network Approach to Visualising Astronomical Time Series?

We present a dimensionality reduction method that caters for the visualisation of time-series. Often, however, the temporal nature of such data is only superficially taken into account as the data are treated as vectors. This means that temporal behaviour in the time-series is ignored and therefore interesting aspects characterising the series are lost. In this work, we propose the use of a dimensionality reduction algorithm that is aware of the temporal nature of the data, hence leading to more informative visualisations. The algorithm proposed involves an autoencoder coupled to a modified objective function.

We apply the proposed algorithm on data originating from the Kepler survey. In particular, we focus on objects that have not been classified before by selecting objects that are unlikely to display periodic behaviour. Interestingly, the proposed visualisation displays a strong correlation between the variability of the objects and their physical properties.?

Willem-Jan Vriend (RUG)The MUSE-WISE distributed data management system

MUSE is the next generation massive integral field spectrograph for the VLT, covering 1'x1' at a sampling of 0.2", and began operations in 2014. In return for the major effort in building this instrument, the MUSE consortium has been awarded 250 nights of VLT time for the guaranteed time observing.

Due to the nature of MUSE data, each data-cube obtained as part of the GTO program is used by most of the consortium institutes which are spread across Europe. Since the effort required in reducing the data is significant and to ensure uniformity in analysis, it is desirable to have a data management system that integrates data reduction, provenance tracking, quality control and data analysis. Moreover, such a system should support the distribution of storage, processing and quality control across all the consortium institutes, Here we present the MUSE-WISE system which incorporates these aspects. It is built on the Astro-WISE system originally designed to handle OmegaCAM imaging data. This has been extended to support 3D spectroscopic data, including quality control of the data. MUSE-WISE was initially used to handle simulated MUSE data and laboratory test data. It is now being used to process GTO data. MUSE-WISE currently stores 178 TB consisting of 54k raw exposures and processed data used by 84 users spread over 7 nodes in Europe. We will present on our experience of using MUSE-WISE to date and discuss the improvements planned for the future.

Ole Streicher (AIP)The Debian Astro project - A "Debian Pure Blend" for astronomy and astrophysics

Debian Astro is a unique attempt to improve the astronomy software ecosystem. Astronomy software with suitable licenses is packaged and distributed as integrated components of Debian operating system. The use of Debian as an open distribution and development model with its standardized build system ensures reliability and stability of the distributed software as well as reproducibility, which is important in an scientific environment.

In my talk, I will present the project structure of the Debian Astro Pure Blend and its integration into the Debian distribution. Also I am adressing how developers will benefit from our approach.

Roland Winkler (AIP)TOAD: The 4MOST instrument model

The 4-metre Multi-Object Spectroscopic Telescope (4MOST) instrument will be mounted on the 4-metre VISTA telescope at Paranal. It uses 2436 individually positioned optical fibres to couple the light of targets into its spectrographs. The fibre positioner is mounted at the Cassegrain focus and is based on the Echidna tilting spines concept. The fibres are located in a hexagon-like structure with a diameter of 535 mm and cover a corresponding field of view on the sky of 2.5 deg diameter.\par

TOAD, the "Top Of the Atmosphere to Detector" simulator is development in parallel with the 4MOST instrument. The ultimate goal is to provide a detailed, end-to-end performance model of the 4MOST instrument. In TOAD, each input target light source is simulated individually and all targets of one simulation run are combined on the CCD. An input target can be any light source, from point sources through extended sources, calibration lamps, sky or stray-light, entering the system at virtually any point in a optical path. During the development of the 4MOST facility, the TOAD simulator gives invaluable insight into the interaction of various parts of the instrument and the impact of engineering design decisions on the system performance.\par

TOAD is implemented in Python and the development process is designed so that TOAD is useful for the 4MOST project in all phases of the project development. During the design process of 4MOST, each simulation of TOAD is tracked by a ticketing system, which already proved to be a very valuable decision.\par

One major problem for TOAD is the validation of the instrument model. We address this problem with by doing unit tests for sub-components, comparison with ZEMAX simulations and comparison with prototype tests from lab experiments. Finally, we put TOAD into context of other simulators like the HARMONI simulator, SiMCADO, the simulator for MICADO, the METIS data simulator and VIRTUAL MOONS.

Matthias Ammler-von Eiff (MPS)On the analysis of large data sets in the PLATO Data Center

PLATO (PLAnetary Transits and Oscillations of stars) is the M3 mission in ESA's Cosmic Vision 2015-2025 Programme and will launch in 2025. It is designed to detect and characterise a large sample of exoplanets down to the size of Earth. PLATO will monitor a large fraction of the sky and collect uninterrupted light curves of up to 1,000,000 stars for periods of up to 2-3 years at a cadence of 600s and shorter. The PLATO Data Center (PDC) will generate the high-level scientific PLATO data products which include the planet size, mass, and age. The generation, validation, and management of the data will be implemented in a system architecture that is distributed over Europe. In addition to the automatic pipeline processing for each target, there will be analyses of large data sets with dedicated tools for statistical analysis and data mining that will run at the Max Planck Institute for Solar System Research. VO capabilities will ensure that data access for scientists is standardised. }

Jochen Klar (AIP) et. al.RDMO - Research Data Management Organiser

Following the call to make the results of publicly funded research openly accessible, more and more funding agencies demand a data management plan (DMP) as part of the application process. The document specifies, how the data management of the project is organized, what datasets will be published, and when. Of particular importance for European researchers is the Open Data Research Pilot of Horizon 2020 which requires data management plans for a set of 9 selected research areas. In order to assist the researchers creating these documents, several institutions developed dedicated software tools, which focus on the assisted editing of the DMP templates provided by the particular funding agency. Beyond the purpose of fulfilling funder requirements, however, DMP can be useful for a number of additional tasks, and could act as information source for all stakeholders involved during the complete life cycle of the project. To address this, we develop RDMO, a web application, which enables the structured planning, implementation and administration of the research data management in a scientific project and, in addition, provides the scientist with a textual DMP.

Building upon a generic set of content, RDMO will be customizable to serve specific disciplinary and institutional requirements. The tool will not only be available at a central web site, but can be installed and integrated into the existing infrastructure of a university or research institution. The tool will be multilingual, with a first version in English and German. Astronomy is one of the two fields for which we will prepare specific content for RDMO, covering topics like VO integration, astronomy specific file formats and infrastructure.

Markus Nullmeier (ZAH)Versatile access to HEALPix-based sky region objects within PostgreSQL data bases with PgSphere

The PgSphere extension of the PostgreSQL relational database management system provides a natural and unified way to store and query spatial data on the celestial sphere, by providing additional data types such as spherical points, polygons and ellipses. Fast search capabilities are provided by appropriate spatial indexing methods within PostgreSQL. PgSphere has been used in many Virtual Observatory projects during the last ten years.

However, the spatial indexing methods that are used by PgSphere or similar software are effective only if the stored objects are relatively small with respect to the celestial sphere. Also, more and more astronomical data are provided in formats that are based on the spatial HEALPix discretisation, such as the MOC standard for sky coverage, published by the International Virtual Observatory Alliance (IVOA).

After a brief introduction to PgSphere, this talk reports on a project implementing HEALPix-based sky region data types such as MOC for PgSphere / PostgreSQL, where one key benefit is the availability of queries involving sky regions within the database query language. The second key aspect is the development and implementation of indexing methods appropriate for sky regions of any size.

Christian Dersch (UMbg) et. al. A Python toolchain for variable star light curve analysis

Python has become more or less the standard programming language in astronomical data analysis. For light curve analysis one has to perform several tasks to get results up to classification. In this approach a Python based open source toolchain for light curve analysis is presented. It is centered around the modules of the SciPy stack and astropy, remarkable contributions were made by VanderPlas et al. in the astroMLproject.

The presented toolchain consists of a set of functions for common preprocessing tasks like spectral analysis using the Lomb-Scargle periodogram, phasing of light curves and calculation of Fourier components using least squares optimization. As it turned out in the last few years, the incorporation of machine learning algorithms is a powerful tool for variable star analysis. The scikit-learn project provides a multipurpose toolbox for machine learning and data mining using Python and thus has been chosen as a central component for the classification of variable stars. It supports both supervised and unsupervised algorithms as well as projection methods like the Principal Component Analysis (PCA). Finally the toolchain is tested with a data subset of the OGLE-III survey showing comparable results to previous analyses. The toolchain is presented as a set of Python modules with some additional IPython notebooks to show the application to example light curves.

Hendrik Heinl (ZAH)Exploring and mining the Gaia Data with VO tools

With the first Gaia Data Release astronomers will face the challenge of data intensive science. The custom workflow of downloading whole catalogs and running the data through a local pipeline will change due to the sheer amount of data: downloading several TB of data will take at best several hours, in average days and also processing the data with local pipelines will be time and resource consuming. To solve these bottleneck problems, astronomers should be enabled to select only that subset of the data, they are actually interested in. The concept is: instead of bringing the data to the code, bring (parts of) the code to the data. For this, the Virtual Observatory developed the Table Access Protocol (TAP) that uses the Astronomical Data Query Language (ADQL) to bring code to the data and select subsets. I will present how to remotely explore the Gaia DR1 catalog and select a subset of the data using TOPCAT's TAP interface. In particular I will demonstrate how to perform crossmatches between subsets of the Gaia catalog with surveys like 2MASS and SDSS and RAVE.

Michael Knörzer (Poster) (UTue)The Tübingen Model-Atom Database - A revised phosphorus model atom

The Tübingen Model-Atom Database (TMAD) is a service in the framework of the German Astrophysical Virtual Observatory (GAVO) that provides ready-to-use model atoms for the elements hydrogen to barium. We present a revised phosphorus model atom and its application in a preliminary spectral analysis of CPD-20º1123.

Peter Kroll, Frank Matthai (Poster) (Sonneberg) Digitization of Sonneberg Plate Archive - Current state and activities

Sonneberg Observatory houses a plate collection with about 275,000 photographic plates taken between 1923 and 2008. About 85% of the plates of sizes from 6x6cm^2 up to 30x30cm^2 have been scanned since 2004 yielding approximately 15 TB of raw data. The database of image and log-book data and the processing pipeline is currently under construction.

Contact

If you have any further questions, please don't hesitate to contact us

Harry Enke:     henke [at] aip [dot] de

Kai Polsterer: kai.polsterer [at] h-its [dot] org

Joachim Wambsganss: jkw [at] ari [dot] uni-heidelberg [dot] de