Argobots: A lightweight low-level threading and tasking framework S Seo, A Amer, P Balaji, C Bordage, G Bosilca, A Brooks, P Carns, ... IEEE Transactions on Parallel and Distributed Systems 29 (3), 512-526, 2017 | 122 | 2017 |
SLURM support for remote GPU virtualization: Implementation and performance study S Iserte, A Castelló, R Mayo, ES Quintana-Ortí, F Silla, J Duato, C Reaño, ... 2014 IEEE 26th International Symposium on Computer Architecture and High …, 2014 | 34 | 2014 |
Improving the User Experience of the rCUDA Remote GPU Virtualization Framework C Reano, F Silla, A Castelló, AJ Pena, R Mayo, ES Quintana-Ortí, J Duato | 23 | 2014 |
A Review of Lightweight Thread Approaches for High Performance Computing A Castelló, AJ Peña, S Seo, R Mayo, P Balaji, ES Quintana-Ortí 2016 IEEE International Conference on Cluster Computing (CLUSTER 2016), 471-480, 2016 | 18 | 2016 |
On the use of remote GPUs and low-power processors for the acceleration of scientific applications A Castelló, J Duato, R Mayo, AJ Pena, ES Quintana-Ortí, V Roca, F Silla The Fourth International Conference on Smart Grids, Green Communications and …, 2014 | 15 | 2014 |
Theoretical Scalability Analysis of Distributed Deep Convolutional Neural Networks A Castelló, MF Dolz, ES Quintana-Ortí, J Duato 2nd High Performance Machine Learning Workshop (HPML 2019), 534-541, 2019 | 12 | 2019 |
GLTO: On the Adequacy of Lightweight Thread Approaches for OpenMP Implementations A Castelló, S Seo, R Mayo, P Balaji, ES Quintana-Ortí, AJ Peña International Conference on Parallel Processing (ICPP-2017), 60-69, 2017 | 11 | 2017 |
Enabling GPU Virtualization in Cloud Environments S Iserte, FJ Clemente-Castelló, A Castelló, R Mayo, ES Quintana-Ortí CLOSER 2016, 2016 | 11 | 2016 |
High Performance and Portable Convolution Operators for Multicore Processors P San Juan, A Castelló, MF Dolz, P Alonso-Jordá, ES Quintana-Ortí SBAC-PAD 2020, 2020 | 9 | 2020 |
GLT: A unified API for lightweight thread libraries A Castelló, S Seo, R Mayo, P Balaji, ES Quintana-Ortí, AJ Peña European Conference on Parallel Processing, 470-481, 2017 | 8 | 2017 |
PyDTNN: A user-friendly and extensible framework for distributed deep learning S Barrachina, A Castelló, M Catalán, MF Dolz, JI Mestre The Journal of Supercomputing 77 (9), 9971-9987, 2021 | 7 | 2021 |
Exploiting task-parallelism on GPU clusters via OmpSs and rCUDA virtualization A Castelló, R Mayo, J Planas, ES Quintana-Ortí The 1st IEEE International Workshop on Reengineering for Parallelism in …, 2015 | 7 | 2015 |
Analysis of model parallelism for distributed neural networks A Castelló, MF Dolz, ES Quintana-Ortí, J Duato Proceedings of the 26th European MPI Users' Group Meeting, 1-10, 2019 | 6 | 2019 |
Exploring the Suitability of Remote GPGPU Virtualization for the OpenACC Programming Model Using rCUDA A Castelló, R Mayo, ES Quintana-Ortí, AJ Pena, P Balaji 2015 IEEE International Conference on Cluster Computing (CLUSTER), 92 - 95, 2015 | 6 | 2015 |
POSTER: Boosting the performance of remote GPU virtualization using InfiniBand connect-IB and PCIe 3.0 C Reano, F Silla, AJ Pena, G Shainer, S Schultz, A Castello, ... Cluster Computing (CLUSTER), 2014 IEEE International Conference on, 266-267, 2014 | 6 | 2014 |
On the adequacy of lightweight thread approaches for high-level parallel programming models A Castelló, R Mayo, K Sala, V Beltran, P Balaji, AJ Peña Future Generation Computer Systems 84, 22-31, 2018 | 5 | 2018 |
Programming parallel dense matrix factorizations with look-ahead and OpenMP S Catalán, A Castelló, FD Igual, R Rodríguez-Sánchez, ES Quintana-Ortí Cluster Computing 23 (1), 359-375, 2020 | 4 | 2020 |
Exploring the interoperability of remote GPGPU virtualization using rCUDA and directive-based programming models A Castelló, AJ Peña, R Mayo, J Planas, ES Quintana-Ortí, P Balaji The Journal of Supercomputing, 1-15, 2016 | 4 | 2016 |
A flexible research-oriented framework for distributed training of deep neural networks S Barrachina, A Castelló, M Catalán, MF Dolz, JI Mestre 2021 IEEE International Parallel and Distributed Processing Symposium …, 2021 | 3 | 2021 |
Anatomy of the BLIS family of algorithms for matrix multiplication A Castelló, ES Quintana-Ortí, FD Igual 2022 30th Euromicro International Conference on Parallel, Distributed and …, 2022 | 2 | 2022 |