The following chat session occurred on June 18th at the EE Times Multicore Virtual Conference. There were 35 people in attendance. I captured as many of the job titles as I could find. I also did some very light editing to make it more readable.
Jim Turley (chat moderator and conference co-chair): Hi, everyone. Jim Turley here.
Thad Meyer (a software engineer): Hi Jim
Jim Turley: Who's worked on a multicore project?
Grant Martin (from Tensilica): I have
Jwalant Desai (a manager at Wipro Technologies): I'm about to start
John Smith: not yet
Jim Turley: Grant, what kinds of problems did you have?
Daryl McDaniel (a senior systems engineer at Intel): Currently in process.
Cornelius Keck (a software engineer from Comnet International): Cheers!
Grant Martin: My main issues were in working with other people's software that did not use enough abstraction for synchronization.
Jim Turley: Not enough abstraction? Was the software too chip-specific?
Grant Martin: It was heterogeneous multicore using ASIPs so a rather ad-hoc synch mechanism was used. The SW needed to be chip specific in the computational cores but could have been more abstract in synch.
Jim Turley: For those of you just starting a project, what problems are you facing?
Jim Turley: Ah, so the sync mechanism was very specific.
Grant Martin: Yes, a bit too specific. But we learned from this and now have a better synch library.
Daryl McDaniel: Operating systems don't scale as the number of synchronization primitives increase – 25000+, adding cores compounds the problem.
Jim Turley: Do you think that is something you could sell, or that others could use?
Grant Martin: I think that was me, it'll be in our next release.
Thad Meyer: Jim, you mean the synchronization library–selling it as a standalone?
Jim Turley: Daryl, what OS are you using, or trying to use?
Daryl McDaniel: VxWorks, Linux, BSD Unix, Mac, Windows.
Jim Turley: Thad/Grant – yes. Do you think what you've learned could lead to a standalone product?
Jim Turley: Daryl, how many cores/chips do you want to use?
Grant Martin: We offer technologies tuned to our IP but try to use industry standards wherever we can (figuring out who is talking to who about what is a synch challenge, I can see!!)
Jim Turley: …and I'm guessing you're using Intel chips, Daryl? 😉
Daryl McDaniel: I am less concerned by the number of cores, but we must be able to scale to thousands of cores.
Jim Turley: Thousands! Yikes!
Daryl McDaniel: Yes, I've been using Intel chips for the last 20 years, through several companies.
Grant Martin: Daryl, will the thousands all be homogeneous or will they sometimes be heterogeneous?
Daryl McDaniel: Initial work is with homogeneous (it is easier) but I need to include capabilities for heterogeneous, both in instruction set and in core capabilities.
Jim Turley: Who has tried building homogeneous projects (i.e., all processors that same)?
Daryl McDaniel: By “core capabilities” I mean performance, extensions, and address space.
Jim Turley: Jwalant, what kinds of projects have you done?
Daryl McDaniel: My previous projects have involved homogeneous systems from two processor SMP to 256 processor ccNUMA.
Jwalant Desai: I'm beginning to just use nVIDIA GP-GPU : Using CUDA
Jim Turley: Jwalant – that's a tough one. Has the processor been easy to understand?
Jwalant Desai: Earlier worked on Texas Instruments OMAP processors and proprietary dual-core DSP.
Jim Turley: With OMAP, at least you have two very different processors.
Jim Turley: Was that easier or harder than this?
Jim Turley: Daryl, what “broke” as you moved from 2 cores to 256?
Jim Turley: Anybody working with a multicore ASIC?
Daryl McDaniel: We would see application performance decrease as we moved to more than three cores. One OS was so slow with 30 processors that it was still trying to boot after two days. We killed it…
Jim Turley: 🙂
Jwalant Desai: Being a software (algorithms) person, I am currently focusing on how to “think parallel.” Did some analysis on LAPACK (linear algebra library) and algorithms used in image processing.
Jim Turley: Two days to boot?!! (Cue Windows joke here)
Jim Turley: Jwalant, how much have software tools helped you “think parallel?”
Daryl McDaniel: Current OS's seem to take increasing amounts of time in processor management as the number increases.
Kelvin Ang (student, University of the West of England): How different is LAPACK from the CUBLAS interface?
Jim Turley: Maybe we need a processor just to handle processor overhead.
Daryl McDaniel: The greatest problem is that global synchronization seems to take an inordinate amount of OS time.
Daryl McDaniel: PS: I am actually experimenting with running the kernel on its own core and everything else on the other cores.
Jim Turley: Sounds like local sync (if possible) would be more efficient than global sync?
Daryl McDaniel: Yes, except that some resources such as I/O require global synchronization since “modern” OSs don't understand I/O processors.
Jwalant Desai: I am not sure of tools helping to think parallel. Are you hinting at some specific tools?
Cornelius Keck: Correct. I/O has become simpler and simpler. Just think IDE.
Jwalant Desai: BLAS is actually a subset of LAPACK.
Jim Turley: But most I/O now is done with its own processor. How does this OS handle them?
Jim Turley: Jwalant – no, no hinting. What does everyone think about relearning how to “think parallel?” Are there tools that help?
Daryl McDaniel: I like to dedicate I/O to its own cores and then use light-weight IPC from the OS(s) running on the other cores.
Jim Turley: That's a good division of labor: I/O on one chip and computing on another.
Daryl McDaniel: Learning to “think parallel” is necessary and hard. There is a tool that comes with Wind River Tornado that allows inter-thread/processor operation to be visualized. I wish I had something like it for general-purpose use.
Jim Turley: Why wouldn't that work for you?
Daryl McDaniel: Jim, who was that last question addressed to?
Jim Turley: Daryl – why wouldn't the Wind River tool work for you?
Daryl McDaniel: The Wind River tool works fine for me, when I am using VxWorks. I haven't been lucky enough to be able to use it for several years.
Jim Turley: That makes sense.
Daryl McDaniel: The only reason I don't divide I/O and compute between cores now days is because of the need to provide device drivers optimized for an “intelligent I/O” environment. I don't have the resources to rewrite many drivers.
Kartik Talsania (a system engineer, modeling and ASICs): what is the most widely used package used to maps tasks to threads? PTHREADS?
Jim Turley: Thanks, everyone. It's time to attend the Hardware panel discussion.
Jwalant Desai: As a beginner I am thinking of learning OPEN-MP for converting some existing serialized applications to parallel. Is that a good start point?
Aydin Balkan (a software engineer from Synopsys): Jim and Daryl: Do you know of any general purpose tools out there to visualize tasks running on threads, and their interactions. I have not heard of one so far.
Kartik Talsania: thanks jwalant…i will use this as a starting point…seems as though everyone is bailing out…