Why Your Analytics Has NOTHING On The Hadron Collider
So you think you know all about analytics? Patting yourself on the back about your latest cross-channel attribution modelling and the terabytes of data you’ve successfully corralled into a database? Time for a little perspective — because there’s big data and then there’s BIG data.
While brand managers the world over complain about the deluge of data they need to make sense of these days, data scientists at CERN are trying to solve the mysteries of the universe using facilities like the Large Hadron Collider (LHC), the world’s largest particle accelerator. Sifting through billions of data points from a fire hose measurable in terabytes per second, the data challenges faced by CERN’s physicists dwarf those of most commercial entities.
Bob Jones is the Project Leader at CERN, who is a driving force behind CERN’s information management expertise and was the Head of CERN openlab between January 2012 and December 2014.
Among the many challenges Jones and his colleagues face is trying to gather insights from more than 20 petabytes of data from CERN’s Large Hadron Collider every year and, in particular, isolating the small number of particle collisions involving the elusive Higgs particle from the vast stream of event data.
To quantify the magnitude of this task, Jones explains: “During Run 1, the LHC produced six million billion proton-proton collisions… Of these, only around 400 produced results compatible with the Higgs particle, whose discovery was announced in July 2012. So you can see that identifying the right 400 events out of six million billion proton-proton collisions is really like looking for a needle in a haystack.”
To manage this scale of data effectively, CERN has been a long-time champion of distributed processing and innovative data storage approaches.
According to Jones, “CERN and the physics community has been a driving force in the development of grid computing since the year 2001. This has led to the deployment of a production infrastructure with a global footprint known as the Worldwide LHC Computing Grid, or WLCG for short, which provides the resources to store, distribute and analyse the data from the LHC.”
Jones works in the IT department of CERN, which serves the whole organisation and has a critical role to play in supporting the scientific programme. Understandably, “It is a very demanding environment with continuous renewal and upgrades to services.”
Despite — or perhaps because of — this pressure, Jones says he is constantly impressed by the quality of the people he works with, including the numerous world-class experts on site, and their approach to tackling vast and complex tasks. “CERN has a university campus feel about it and the people are very open and willing to help and collaborate.”
We asked Jones to describe what for him and his colleagues are the biggest challenges of big data at CERN. The challenges are many and varied: “It is the combination of storage capacity, access patterns and sometimes unpredictable analysis workloads that are the biggest challenge,” he said.
As well as dealing with the voluminous data produced by CERN’s many experiments, the speed with which physicists develop and change focus in their experimental work adds to the data-management challenge. According to Jones, research moves quickly and those involved can’t always predict which particular dataset will prove to be the most popular and require the most resources.
“So our system needs to be very dynamic. We have a 3.5 MW computer centre on-site in Geneva and have leased space in a second computer centre at Wigner in Budapest, Hungary.
“We are ready for Run 2 of the LHC, which started in June 2015 with the experiments taking data at the unprecedented energy of 13 TeV, following the two-year long shutdown. We have multiple 100Gbps lines linking the two centres, which enables us to operate them as a single OpenStack cloud.”
And the volume of data is set to continue growing. The LHC experiments have already recorded 100 times more data for the summer conferences this year than they had around the same time after the LHC started up at 7 TeV in 2010, he says.
On the grid
While CERN originally hosted all IT services on-site in a traditional service provision model, as the needs of the scientific programme expanded it focused more heavily on developing off-site data processing and management capability through grid computing.
This approach has allowed CERN to federate IT resources from partner organisations around the world, as well as scale storage and processing power more efficiently. The cloud services market — originated by Amazon Web Services back in 2002 and much-loved by data scientists in publishing and other high-data-volume sectors — has proved a powerful tool for CERN as well.
“This market now offers us new opportunities to increase the scale and range of the IT services we can build on. We are working towards a hybrid cloud model, where we can we can opportunistically use any resources taking into account availability, price and policy.”
He said CERN is actively investigating this approach with cloud service companies and other research organisations in the context of the Helix Nebula initiative. “So seeking new opportunities and keeping flexible enough to profit from them is a key aspect of our strategy at CERN. But we have learnt that operating production services at this scale is not something that can be improvised.”
According to Jones, “Every time we have had to increase scale it has required development which takes time and advanced planning. Similarly, the importance of preserving data is paramount. CERN puts significant resources into bit-level preservation of data, including the use of tape systems where the technology continues to evolve.”
This archived data must be actively managed to ensure it remains available for future use, he said. However there is most certainly a balancing act required to ensure CERN holds on to meaningful scientific data but doesn’t unintentionally house excessive volumes of meaningless information.
“At this scale it is not possible to keep all the data (the LHC produces up to a Petabyte of data per second) and it is essential to have efficient data-filtering mechanisms so that we can separate the wheat from the chaff. A key risk is throwing away the data you need and cannot reproduce.”
We also asked Jones if the approach to data analysis or exploration is very different when dealing with the vast quantities of data from the Large Hadron Collider or if the thought processes are similar to smaller-scale experiments, just executed using tools capable of handling greater scale.
“The volume of data produced at the LHC is a challenge, but the process is similar for smaller-scale experiments,” he said
“Particles collide at high energies inside the detectors, creating new particles that decay in complex ways as they move through layers of subdetectors. The subdetectors register each particle’s passage and microprocessors convert the particles’ paths and energies into electrical signals, combining the information to create a digital summary of the collision event.”
The raw data per event is around one million bytes (1 MB), produced at a rate of about 600 million events per second. The Worldwide LHC Computing Grid tackles this mountain of data in a two-stage process.
First, it runs dedicated algorithms written by physicists to reduce the number of events and select those considered interesting — a sophisticated winnowing out of noise from the data sets. “Analysis can then focus on the most important data — that which could bring new physics measurements or discoveries.”
When it comes to analytical tools, it’s unsurprising to hear that CERN’s data scientists have built and tweaked their own analytical toolkit. “The physics community has progressively developed, over a number of years, a set of software tools dedicated to this task. These tools are constantly being improved to ensure they continue work at the growing scale of the LHC data challenge. ROOT is a popular data-analysis framework — it is a bit like R on steroids.”
Jones says that grid computing has also been immensely helpful in enabling physicists to run analysis at scale. “Grid computing helps by providing an underlying global infrastructure with the capacity to be able to match the analysis needs of the LHC. But the grid itself is evolving to make more use of cloud computing techniques and profit from the improvements in hardware (processors, storage etc.) as well as the cost-effectiveness of high performance networks.”
Lessons for brands
The CERN Data Centre has the ability to process incredibly high throughput in order to manage the data coming out of the Large Hadron Collider. That prompts the question of whether there will be many situations in which the commercial sector would need that extreme throughput capability.
According to Jones, “CERN is a leader but not alone in having to deal with such high data throughputs. We expect to see similar scales in other sciences (such as next generation genome sequencing as well as the Square Kilometre Array which will primarily be deployed in Australia and South Africa) and various business sectors linked to the growing Internet of Things in the near future.”
He described CERN as being ahead of the curve, and said the technologies and processes developed — as well as the lessons learned — at CERN can be applied in other fields. However, he emphasised that CERN’s advanced capabilities are not acquired by happenstance — the organisation spends a great deal of effort in growing the skills needed to develop cutting-edge data solutions.
“Education is a key element of CERN’s mission. For those working at CERN, we have technical and management training programmes and series of computing seminars as well as the CERN School of Computing. We are constantly recruiting young scientists, engineers and technicians who also bring new skills and ideas into CERN’s environment. Engagement with leading IT companies through CERN openlab has been a source of many new developments and helps train successive generations of personnel in the latest techniques, ” he said.
CERN is also a poster child for the power of not only open source but also a culture of organisational openness. Jones said, “CERN’s open culture coupled with developments such as commercial cloud services where an organisation’s data may be stored off-site, and a Bring Your Own Device (BYOD) policy for the site, means we have to be proactive to ensure everyone respects intellectual property rights and the relevant data protection legislation.
“We are also active in the deployment of federated identity-management systems for access to IT services, and such a model has been in place for the Worldwide LHC Computing Grid since its creation.”
This article originally appeared at www.which-50.com
Latest News
Sydney Comedy Festival: Taking The City & Social Media By Storm
Sydney Comedy Festival 2024 is live and ready to rumble, showing the best of international and homegrown talent at a host of venues around town. As usual, it’s hot on the heels of its big sister, the giant that is the Melbourne International Comedy Festival, picking up some acts as they continue on their own […]
Global Marketers Descend For AANA’s RESET For Growth
The Australian Association of National Advertisers (AANA) has announced the final epic lineup of local and global marketing powerhouses for RESET for Growth 2024. Lead image: Josh Faulks, chief executive officer, AANA Back in 2000, a woman with no business experience opened her first juice bar in Adelaide. The idea was brilliantly simple: make healthy […]
Is Meta’s New AI Chatbot Too Left-Wing?
Meta's chatbot accused of being left-wing after being caught wearing a Che Guevara T-shirt & listening to Billy Bragg.
TV Ratings (23/04/2024): Why Did No One Tell Angela That Farmer Wants A Wife Is Set On A Farm?
As wonderful as this headline is, let's face it, we all know an 'Angela', don't we?
PubMatic Unveils New AI Partnership To Turn Social Posts Into Ads For Any Digital Channel
Here's some nifty tech for turning social posts into ads. Assuming said posts aren't one-star character assassinations.
Intuit Mailchimp Makes A Splash With Its First Australian Brand Campaign
Ever laugh along at a gag you didn't get so as not to appear dumb? Get ready for more feigning with this new work.
GumGum’s Rob Hall: Advertisers Can No Longer “Rely On Binary Descriptions” Of Consumers
If anyone's got their finger on adtech's pulse, it's Rob Hall. He also avoids using the good paper in the office printer
Mastercard Nabs Florencia Aimo From Marriott International
Marriott International's Florencia Aimo jumps from the hotel business to the exploitative credit card one.
Bastion Agency Appoints Cheuk Chiang As New ANZ CEO
Cheuk Chiang takes the reins over at Bastion Agency. But not the rains down in Africa.
Spotlight On Sponsors: Major Sponsorship Wins After A Disappointing Week In Sport
B&T continuing our deep dive into local sport sponsorships & that's despite not a single offer of a free ticket as yet.
Macca’s Marketing Director, Samantha McLeod On Big Mac Chant: “What Was Once Old Is Now Cool Again”
Macca's using the power of nostalgia in latest Big Mac campaign. Well, only for those who've ever eaten one sober.
World Premiere Of Midnight Oil: The Hardest Line To Open Sydney Film Festival 2024
Oil's biopic to open Sydney Film Festival. Here's hoping Molly Meldrum will take his pants down at the premiere.
Entries Are Now Open For The 2024 Brandies, IntelligenceBank’s Annual Brand Marketing Awards
The Brandies are, of course, a prestigious marketing gong and not the mystery tipple favoured by nannas everywhere.
The Fred Hollows Foundation Appoints Ardent For PR
Yes, we all like to have a joke at PR's expense. But sometimes it does important work, like this.
AI, eCommerce & Marketing Specialists Are In Increased Demand By Businesses, New Data From Fiverr Shows
Has your philosophy & anthropology degree left you with nothing but a huge HECS debt? Here's what you should've studied.
Perth’s First 3D Anamorphic Billboard Arrives Courtesy Of oOh!media
Do you love a buzzword? Now you can add anamorphic to the list as it relates to billboards, not a colleague's ears.
MasterChef Australia & Crown Resorts Launch Unique Dining Experience With ALUMNI
A pop-up restaurant staffed by MasterChef contestants! That's fine dining prices for first-year apprentice chef cuisine!
Amanda Laing Announces Resignation From Foxtel Group
Foxtel's chief commercial & content officer heads for the exits. Read nice things the bosses said about her right here.
The Lost Letters From Our Diggers: News Corp Unveils ANZAC Day Special
It's nice when brands respectfully acknowledge ANZAC Day.
Howatson+Company Acquires Akkomplice
Large indie acquires a slightly smaller indie. Much like a shark eating a tuna, just with less thrashing and blood.
Google Delays Third-Party Cookie Deprecation Again
In good news for the sale of picture library biscuit photos, Google continues to tease over the end of cookies.
Education A Low Priority For Aussies More Concerned With Cost Of Living Forethought Study Reveals
Study finds Aussies cutting back on education due to cost of living. Booze & Uber Eats sales remain largely unaffected.
“I’m Still The Same Person That I Was”: Rikki Stern Says “Fucc It” To Cancer Stereotypes
B&T always happy to promote the anti-cancer cause. Even brands that massively overdo it with the hot pink.
The Unapproved Climate Certification Allegedly Causing Mass Greenwashing
Are you left flummoxed in the canned tuna & free range eggs aisle? Just wait till this green certification gets up.
TV Ratings (22/04/2024): Fans Mock “Over The Top” Reaction To New MasterChef Judges
MasterChef returns for its 2024 season. B&T stands by putting peppercorns in Gravox & no one will be any the wiser.
Dentsu Restructure: Muddle, Harvey & Johnston Take Leadership Baton As Bass & Yurisich Exit
A large broom has swept through Dentsu's local ops this morning, taking with it some big names & the air con's cobwebs.
Industry Shares Trends Shaping The Industry This International Creators Day
B&T's asking adland creators to reveal their top trends. And it's not good news for your Jenny Kee cardigan collection.
Mable Extends HOYTS Sensory Screenings Partnership
Mable has extended its HOYTS sensory screening partnership. Vigorously defends its two-star Oppenheimer review.
Orphan Launches ‘They Need Our Help. We Need Yours’ For Children’s Cancer Institute
Anything to do with childhood cancers has B&T's 110% support. That said, we do ignore the red meat & alcohol warnings.
Smile Team Orthodontics & Keep Left Collaborate On Smile-Inducing Campaign
As parents would attest, given the cost of orthodontics you'd expect this campaign to be a lavish production indeed.
Opinion: How Video Calls Neglect Learning Diversity
Need an excuse to duck out of a video call this arvo? Show this to your boss.
DoubleVerify Achieves First-Of-Its-Kind Responsible AI Certification From TrustArc
DoubleVerify receives responsible AI certification. However, not its robotic vacuum that's been seen menacing the cat.
Smile For A Good Cause: The Social Media Campaign Giving Back To The Community
Are you known as the office Austin Powers? More for you teeth than shagability? Get snappy new fangs with this news.
Elon Musk Mocks Albo After ESafety Wins Court Injunction Against X
Albo's 2024 from hell continues - Rabbitohs in crisis, down in the polls and now feuding with world's richest man.
Real Estate Developer In Hot Water Over “Sexually Exploitative” OOH Campaign
Real estate agents again tops in the 'least trusted profession' polls, nudging used car salesmen & ad creatives.
Epsilon’s Shane Hanby: Post-Cookie Era Relies On “Teamwork” Between Brands, Marketers & Tech
This pro predicts more "teamwork" in a post-cookie era. Which spells bad news for the uncooperative or plain stubborn.